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COMPOSITIONS AND METHODS FOR INHIBITING GENE EXPRESSION 

5 

CROSS-REFERENCE TO REL ATED APPLICATIONS 

This application claims the priority benefit of U.S. Patent Application 
10 09/545,574, filed April 7, 2000, pending, which is hereby incorporated herein by 

reference in its entirety. 

STATEMENT OF RIGHTS TO INVEN TIONS MADE UNDER 
FF.DER AT I.Y SPONSORED RESEARCH 
15 Not applicable. 

TECHNICAL FIELD 
This invention is in the field of genetic analysis. Specifically, the invention 
relates to the generation of a eukaryotic vector that allows bi-directional 
20 transcription of a transgene to yield both sense and antisense RNA transcripts from 

the same transgene. The compositions and methods embodied in the present 
invention are particularly useful for targeted inhibition of gene expression in a 
eukaryotic cell. 

25 BACKGROUND OF THE INVENTION 

The structure and biological behavior of a cell is determined by the pattern of 
gene expression within that cell at a given time. Perturbations of gene expression 
have long been acknowledged to account for a vast number of diseases including, 
numerous forms of cancer, vascular diseases, neuronal and endocrine diseases. 
30 Abnormal expression patterns, in form of amplification, deletion, gene 

rearrangements, and loss or gain of function mutations, are now known to lead to 
aberrant behavior of a disease cell. Aberrant gene expression has also been noted as 
' a defense mechanism of certain organisms to ward off the threat of pathogens. 
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One of the major challenges of genetic engineering has been to regulate the 
expression of targeted genes that are implicated in a wide diversity of physiological 
responses. While overexpression of an exogenously introduced transgene in a 
eukaryotic cell is relatively straightforward, targeted inhibition of specific genes has 

5 been more difficult to achieve. Traditional approaches for suppressing gene 

expression, including site-directed gene disruption, antisense RNA or co-suppressor 
injection, require complex genetic manipulations or heavy dosages of suppressors 
that often exceeds the toxicity tolerance level of the host cell 

Recently, a new technique, "double-stranded RNA interference" has 

10 emerged in the study of gene silencing. Several research groups have demonstrated 

a marked inhibition of a specific nuclear gene expression in a wide range of 
eukaryotes by introduction into cells of dsRNA fragments that bear sequence 
homology with the nuclear gene. For instance, Fire et al. (1998) Nature 395: 854 
reported the success of gene-specific interference in C. elegans that was mediated by 

1 5 ingested E. coli carrying a prokaryotic vector capable of producing both sense and 

antisense RNAs of the selected C. elgans genes. Misquitta et al. demonstrated the 
targeted disruption of nautilus gene in Drosophila melanogaster by injecting into 
the Drosophila embryo multiple copies of nautilus dsRNA. See Misquitta et al. 
(1999) PNAS U.S.A. 96:1451-1456. Studies by Ngo et aL (1998) Proc. Natl Acad. 

20 ofSci. U.S.A., 96: 145 1-1456 confirmed that dsRNA interference also occurs in 

certain protozoan species. Earlier studies by Cogoni et al. and Hamilton et al. 
suggested that formation of dsRNA play a pivotal role in gene silencing in fungi 
Neurospora crassa and other plants. See Cogoni et al. (1999) Nature 399: 166-169; 
Hamilton et al. (1999) Science 286: 950-952; and Waterhouse et al. (1999) PNAS 

25 USA. 95: 13959-13964. More recent investigations by Wargelius et al. revealed 

that this phenomenon is also conserved in vertebrates such as the zebrafish. 
Wargelius et al. Biochem. Biophys. Res. Commun. 263: 156-161. 

Current techniques for achieving RNA mediated gene silencing include: (a) 
use of prokaryotic vectors capable of transcribing both sense and antisense KNA 

30 (Fire et al. (1998) Nature 395: 854; (b) in vitro transcription of individual strands of 

a selected gene followed by annealing the transcribed sense and antisense RNAs 
(see, e.g. Misquitta et al. (1999) PNAS USA. 96: 1451-1456); and possibly (c) 
viruses induced gene silencing (see, e.g. Angell et al. (1997) EMBO Journal 16: 
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3675-3684; Angell et al. (1999) Plant Journal 20: 357-362). However, these 
methods bear a number of intrinsic limitations. First, none of these methods 
employs gene delivery vehicles that are applicable for consistent and persistent 
inhibition of gene expression in a eukaryote. Second, these existing methods do not 

5 necessarily result in production of a substantially homogenous population of 

dsRNAs. Notably, the in vitro preparation of double-stranded RNAs by transcribing 
and annealing sense RNA transcripts to antisense transcripts is time consuming, 
labor intensive, and not amenable for mass production or high-throughput analyses. 
Thus, there remains a considerable need for compositions and methods to 

10 effect dsRNA-mediated gene silencing. An ideal reagent would be a self-replicating 

vector that is (a) capable of autonomous replication and expression of a selected 
transgene in a eukaryotic cell; and (b) capable of yielding both sense and antisense 
RNA transcripts from the same transgene, so as to effect production of dsRNA 
transcripts in a eukaryotic host cell. The present invention satisfies these needs and 

1 5 provides related advantages as well. 

SUMMARY OF THE INVENTION 
A principal aspect of the present invention is the design of a eukaryotic 
recombinant vector to effect gene silencing in a eukaryotic cell that is susceptible to 

20 dsRNA-mediated reduction of gene expression. Such a vector allows bi-directional 

transcription of a transgene to yield both sense and antisense RNA transcripts of the 
same transgene in a eukaryotic cell. While not being bound to any one theory, the 
production of dsRNAs induces transcriptional and/or post-transcriptional gene 
silencing in the host cell. Accordingly, the present invention provides a recombinant 

25 vector having the following unique characteristics: it comprises a viral replicon 

having two overlapping transcription units arranged in an opposing orientation and 
flanking a transgene of interest, wherein the two overlapping transcription units 
yield both sense and antisense RNA transcripts from the same transgene fragment in 
a eukaryotic host cell. 

30 In one aspect of this embodiment, each of the overlapping transcription units 

of the vector comprises a promoter and a terminator that are arranged in one of the 
configurations shown in Figure 2(a)-(d). The promoter can be constitutive or 



3 



WO 01/77350 PCT/US01/11436 



inducible; it can be active in all tissues and cell types of an organism or operative 
only in selected tissues (i.e. tissue-specific). 

In another aspect, the recombinant vector comprises a viral replicon that is 
derived from a DNA virus. Such DNA viruses can be selected from the group 

5 consisting of Geminivirus, Caulimoviridae, Badnaviridae, Circoviridae, 

Circinoviridae, Parvoviridae, Papovaviridae, Polyomaviridae, Adenoviridae, 
Herpesviridae, Poxviridae, Iridoviridae, Baculoviridae, Hepadnaviridae, 
Retroviridae, Gyrovirus, Nanovirus, and African Swine Fever virus. 

In yet another aspect, the subject vector is capable of autonomous replication 

10 in a eukaryotic cell. 

In still another aspect, the subject vector is capable of inhibiting expression 
of genes endogenous to a eukaryotic host cell. Non-limiting representative 
eukaryotic cells whose gene expression can be inhibited upon introduction of the 
subject vectors are fungi, yeast cells, plant cells, inset, avian, mammalian or other 

1 5 animal cells. Preferably, the vectors effect a reduced expression of an endogenous 

gene that is substantially homologous to the transgene contained in the overlapping 
transcription units of the vectors. More preferably, delivery of the vectors into a 
suitable host cell results in a phenotypic change of the host cell. In certain preferred 
embodiments, the endogenous gene is native to the host cell. The endogenous gene 

20 can also be heterologous to the host cell. In some embodiments, the endogenous 

gene is a pathogenic gene derived from one or more members of the group 
consisting of virus, bacterium, fungus, and protozoa. The transgene carried in the 
vector can be a nucleotide sequence that encodes a membrane protein, a cytosolic 
protein, a secreted protein, a nuclear protein, or a chaperon protein. 

25 The present invention also provides host cells transformed with the invention 

vectors. The present invention further provides a transgenic plant comprising a 
eukaryotic recombinant vector of the present invention. 

Also provided by the present invention is a kit for generating a double- 
stranded RNA transcript in a eukaryotic cell that contains the subject vectors in 

30 suitable packaging. 

Further embodied in the present invention is a method of inhibiting 
expression of an endogenous gene present in a eukaryotic cell. The method 
involves: (a) providing a eukaryotic recombinant vector containing a transgene 
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that is substantially homologous to the endogenous gene; (b) introducing the 
eukaryotic recombinant vector into the eukaryotic cell; and (c) culturing the 
eukaryotic cell of (b) under conditions favorable for expression of both sense and 
antisense RNA transcripts from the transgene that is contained in the transcription 
5 units of the vector, and thereby inhibiting expression of the corresponding 

endogenous gene in the eukaryotic cell. 

Also included in the present invention is a method of identifying a biological 
function(s) of an endogenous gene of interest in a eukaryotic cell by selectively 
inhibiting the expression of the endogenous gene. The method comprises: (a) 

10 providing a eukaryotic recombinant vector containing a transgene that is 

substantially homologous to the endogenous gene; (b) introducing the eukaryotic 
recombinant vector of (a) into the eukaryotic cell; (c) culturing the eukaryotic cell 
of (b) under conditions favorable for expression of both sense and antisense RNA 
transcripts from the transgene contained in the eukaryotic recombinant vector and 

1 5 thereby inhibiting expression of the endogenous gene in the eukaryotic cell; and (d) 

determining one or more phenotypic changes in the eukaryotic cell that correlate 
with the inhibited expression of the endogenous gene, thereby identifying the 
biological function(s) of the endogenous gene in the eukaryotic cell. In essence, the 
subject methods allow the creation of a transient or more long-term gene-specific 

20 knock-out system for analyzing the biological function of any endogenous gene of 

. interest. 

BRIEF DESCRIPTION OF THE DRAWINGS 
Figure 1 is a schematic representation of the process for production of 
25 dsRNA transcripts by a subject vector containing two overlapping transcription 

units. 

Figure 2 (a)-(d) depict four different configurations of the overlapping 
transcription units of the subject vectors. 

Figure 3 is a schematic representation of an exemplary construct MSVLSB- 

30 6. 

Figure 4 depicts the nucleotide sequence of the vector pMSVLSB-1 (SEQ ED 
NO:9) described in Examples 1-2. 
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Figure 5 depicts the nucleotide sequence of the vector pMSVLSB-2 (SEQ ID 
NO: 1 0) described in Examples 1 -2. 

Figure 6 depicts the nucleotide sequence of the vector pMSVLSB-3 (SEQ ID 
NO:l 1) described in Examples 1-2. 
5 Figure 7 depicts the nucleotide sequence of the vector pMS VLSB-4 (SEQ ID 

NO: 12) described in Examples 1-2. 

Figure 8 depicts the nucleotide sequence of the vector pMSVLSB-5 (SEQ ID 
NO: 13) described in Examples 1-2. 

Figure 9 depicts the nucleotide sequence of the vector pMSVLSB-6 (SEQ ID 
10 NO: 14) described in Examples 1-2. 

MODES FOR CARRYING PITT THE INVENTION 
Throughout this disclosure, various publications, patents and published 
patent specifications are referenced by an identifying citation. The disclosures of 
1 5 these publications, patents and published patent specifications are hereby 

incorporated by reference into the present disclosure to more folly describe the state 
of the art to which this invention pertains. 



General Techniques: 

20 The practice of the present invention will employ, unless otherwise 

indicated, conventional techniques of immunology, biochemistry, chemistry, 
molecular biology, microbiology, cell biology, genomics and recombinant DNA, 
which are within the skill of the art. See, e.g., Matthews, PLANT VIROLOGY, 3 rd 
edition (1991 ); Sambrook, Fritsch and Maniatis, MOLECULAR CLONING: A - 

25 LABORATORY MANUAL, 2 nd edition (1989); CURRENT PROTOCOLS IN 

MOLECULAR BIOLOGY (F. M. Ausubel, et al. eds., (1987)); the series 
METHODS IN ENZYMOLOGY (Academic Press, Inc.): PCR 2: A PRACTICAL 
APPROACH (M.J. MacPherson, B.D. Hames and G.R. Taylor eds. (1995)), Harlow 
and Lane, eds. (1988) ANTIBODIES, A LABORATORY MANUAL, and 

30 ANIMAL CELL CULTURE (R.L Freshney, ed. (1987)). 

As used in the specification and claims, the singular form "a", "an" and "the" 
include plural references unless the context clearly dictates otherwise. For example, 
the term "a cell" includes a plurality of cells, including mixtures thereof. 
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Definitions: 

A "plant cell" refers to the structural and physiological unit of plants, 
consisting of a protoplast and the cell wall. 
5 A "protoplast" is an isolated cell without cell walls, having the potency for 

regeneration into cell culture, tissue or whole plant. 

A "host cell" includes an individual cell or cell culture which can be or has 
been a recipient for vector(s) or for incorporation of nucleic acid molecules and/or 
proteins. Host cells include progeny of a single host cell, and the progeny may not 
10 necessarily be completely identical (in morphology or in genomic of total DNA 

complement) to the original parent cell due to natural, accidental, or deliberate 
mutation. A host cell includes cells transfected in vivo with a polynucleotide^) of 
this invention. 

The terms "polynucleotide", "nucleotides" and "oligonucleotides*' are used 
15 - . interchangeably. They refer to a polymeric form of nucleotides of any length, either 

deoxyribonucleotides or ribonucleotides, or analogs thereof. Polynucleotides may have 
any three-dimensional structure, and may perform any function, known or unknown. 
The following are non-limiting examples of polynucleotides: coding or non-coding 
regions of a gene or gene fragment, loci (locus) defined from linkage analysis, exons, 
20 introns, messenger RNA (mRNA), transfer RNA, ribosomai RNA, ribozymes, cDNA, 

recombinant polynucleotides, branched polynucleotides, plasmids, vectors, isolated 
DNA of any sequence, isolated RNA of any sequence, nucleic acid probes, and 
primers. A polynucleotide may comprise modified nucleotides, such as methylated 
nucleotides and nucleotide analogs. If present, modifications to the nucleotide 
25 structure may be imparted before or after assembly of the polymer. The sequence of 

nucleotides may be interrupted by non-nucleotide components. A polynucleotide may 
be further modified after polymerization, such as by conjugation with a labeling 
component. 

A "gene" refers to a polynucleotide containing at least one open reading 
30 frame that is capable of encoding a particular protein after being transcribed and 

translated. 
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"Genes of a specific developmental origin" refer to genes expressed at 
certain but not all developmental stages. For instance, a gene may be of embryonic 
or adult origin depending on the stage during which the gene is expressed. 

A "disease-associated" or "disease-causing" gene refers to any gene which is 
5 yielding transcription or translation products at an abnormal level or in an abnormal 

form in cells derived fiom a disease-affected tissues compared with tissues or cells of a 
control It may be a gene that becomes expressed at an abnormally high level; it may 
be a gene that becomes expressed at an abnormally low level, where the altered 
expression correlates with the occurrence and/or progression of the disease. A disease- 
10 associated gene also refers to gene possessing mutation(s) or genetic variation that is 

directly responsible or is in linkage disequilibrium with gene(s) that is responsible for 
the etiology of a disease. The transcribed or translated products may be known or 
unknown, and may be at normal or abnormal level. 

A gene "database" denotes a set of storeddata which represent a collection 
15 of sequences including nucleotide and peptide sequences, which in turn represent a 

collection of biological reference materials. 

As used herein, "expression" refers to the process by which a polynucleotide 
is transcribed into mRNA and/or the process by which the transcribed mRNA (also 
referred to as 44 transcript") is subsequently being translated into peptides, 
20 polypeptides, or proteins. The transcripts and the encoded polypeptides are 

collectedly referred to as gene product. If the polynucleotide is derived from 
genomic DNA, expression may include splicing of the mRNA in an eukaryotic cell. 

"Differentially expressed", as applied to nucleotide sequence or polypeptide 
sequence in a subject, refers to over-expression or under-expression of that sequence 
25 when compared to that detected in a control. Underexpression also encompasses 

absence of expression of a particular sequence as evidenced by the absence of 
detectable expression in a test subject when compared to a control. 

'Differential expression" refers to alterations in the abundance or the 
expression pattern of a gene product. 
30 A "primer" is a short polynucleotide, generally with a free 3' -OH group, that 

binds to a target or "template" potentially present in a sample of interest by 
hybridizing with the target, and thereafter promoting polymerization of a 
polynucleotide complementary to the target. 
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The term "hybridize" as applied to a polynucleotide refers to the ability of 
the polynucleotide to form a complex that is stabilized via hydrogen bonding 
between the bases of the nucleotide residues in a hybridization reaction. The 
hydrogen bonding may occur by Watson-Crick base pairing, Hoogstein binding, or 

5 in any other sequence-specific manner. The complex may comprise two strands 

forming a duplex structure, three or more strands forming a multi-stranded complex, 
a single self-hybridizing strand, or any combination of these. The hybridization 
reaction may constitute a step in a more extensive process, such as the initiation of a 
PCR reaction, or the enzymatic cleavage of a polynucleotide by a ribozyme. 

1 0 Hybridization can be performed under conditions of different "stringency". 

Relevant conditions include temperature, ionic strength, time of incubation, the 
presence of additional solutes in the reaction mixture such as formamide, and the 
washing procedure. Higher stringency conditions are those conditions, such as 
higher temperature and lower sodium ion concentration, which require higher. 

1 5 minimum complementarity between hybridizing elements for a stable hybridization 

complex to form. In general, a low stringency hybridization reaction is carried out at 
about 40 °C in 10 x SSC or a solution of equivalent ionic strength/temperature. A 
moderate stringency hybridization is typically performed at about 50 °C in 6 x SSC, 
and a high stringency hybridization reaction is generally performed at about 60 °C in 

20 lxSSC 

When hybridization occurs in an antiparallel configuration between two 
single-stranded polynucleotides, the reaction is called "annealing*' and those 
polynucleotides are described as "complementary". A double-stranded 
polynucleotide can be "complementary" or "homologous" to another polynucleotide, 

25 if hybridization can occur between one of the strands of the first polynucleotide and 

the second. "Complementarity" or "homology" (the degree that one polynucleotide 
is complementary with another) is quantifiable in terms of the proportion of bases in 
opposing strands that are expected to form hydrogen bonding with each other, 
according to generally accepted base-pairing rules. 

30 In the context of polynucleotides, a "linear sequence" or a "sequence" is an 

order of nucleotides in a polynucleotide in a 5' to 3' direction in which residues that 
neighbor each other in the sequence are contiguous in the primary structure of the 
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polynucleotide. A "partial sequence" is a linear sequence of part of a polynucleotide 
which is known to comprise additional residues in one or both directions. 

The terms "cytosolic", "nuclear" and "secreted" as applied to cellular 
proteins specify the extracellular and/or subcellular location in which the cellular 
5 protein is mostly localized. Certain proteins are "chaperons", capable of 

translocating back and forth between the cytosol and the nucleus of a cell. 

A "subject" as used herein refers to a biological entity containing expressed 
genetic materials. The biological entity is preferably can be plant, animal, or 
microorganisms including bacteria, viruses, fungi, and protozoa. Tissues, cells and 
10 their progeny of a biological entity obtained in vivo or cultured in vitro are also 

encompassed. 

A "control" is an alternative subject or sample used in an experiment for 
comparison purpose. A control can be positive" or Negative" For example, 
where the purpose of the experiment is to detect a differentially expressed transcript 

15 or polypeptide in cell or tissue affected by a disease of concern, it is generally 

preferable to use a positive control (a subject or a sample from a subject, exhibiting 
such differential expression and syndromes characteristic of that disease), and a 
negative control (a subject or a sample from a subject lacking the differential 
expression and clinical syndrome of that disease). 

20 "Heterologous" means derived from a genotypically distinct entity from the 

rest of the entity to which it is being compared. For example, a promoter removed 
from its native coding sequence and operatively linked to a coding sequence other 
than the native sequence is a heterologous promoter. 

A "cell line" or "cell culture" denotes bacterial, plant, insect or higher 

25 eukaryotic cells grown or maintained in vitro. The descendants of a cell may not be 

completely identical (either morphologically, genotypically, or phenotypically) to 
the parent cell. 

A "vector" is a nucleic acid molecule, preferably self-replicating, which 
transfers an inserted nucleic acid molecule into and/or between host cells. The term 
30 includes vectors that function primarily for insertion of a DNA or RNA into a cell, 

replication of vectors that function primarily for the replication of DNA or RNA, 
and expression vectors that function for transcription and/or translation of the DNA 
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or RNA. Also included arc vectors that provide more than one of the above 
functions. 

An "expression vector" is a polynucleotide which, when introduced into an 
appropriate host cell, can be transcribed and translated into a polypeptide(s). An 
5 "expression system" usually connotes a suitable host cell comprised of an expression 

vector that can function to yield a desired expression product. 

A "replicon" refers to a polynucleotide comprising an origin of replication 
(generally referred to as an ori sequence) which allows for replication of the 
polynucleotide in an appropriate host cell. Examples of replicons include episomes 
10 (such as plasmids), as well as chromosomes (such as the nuclear or mitochondrial 

chromosomes). 

A t4 transcription unit" is a DNA segment capable of directing transcription of 
a gene or fragment thereof. Typically, a transcription unit comprises a promoter 
operably linked to a gene or a DNA fragment that is to be transcribed, and optionally 
1 5 regulatory sequences located either upstream or downstream of the initiation site or 

the termination site of the transcribed gene or fragment. 

Vectors of the present invention 

A central aspect of the present invention is the design of a recombinant 
20 vector suited for bi-directional transcription of a transgene to yield both sense and 

antisense RNA transcripts of the transgene in a eukaryotic cell. The invention 
vectors are particularly suited for mediating nuclear gene silencing in a variety of 
biological systems. Distinguished from the previously described DNA vectors, the 
subject vectors have the following unique characteristics: (a) the vector replicates 
25 and directs expression of a transgene in a eukaiyotic cell; and (b) the vector 

comprises a replicon having two overlapping transcription units arranged in an 
opposing orientation and flanking a transgene of interest, wherein the two 
overlapping transcription units yield both sense and antisense RNA transcripts from 
the same transgene in a eukaryotic host cell. 
30 Several factors apply to the design of vectors having the above-mentioned 

characteristics. First, the vector comprises a replicon having an origin of replication 
(generally referred to as an on sequence) which permits replication of the vector in a 
eukaryotic host cell. A preferred replicon is one comprising viral sequences capable 
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of directing autonomous replication of the vector in an appropriate host cell. Non- 
limiting examples of viral replicons include sequences derived from DNA viruses 
such as Geminivirus, Caulimoviridae, Badnaviridae; Circoviridae, Circinoviridae, 
Parvoviridae, Papovaviridae, Polyomaviridae, Adenoviridae, Herpesviridae, 

5 Poxviridae, Iridoviridae, Baculoviridae, Hepadnaviridae, Retroviridae, Gyrovirus, 

Nanovirus, and African Swine Fever virus, or the like. In addition to the replication 
origin, a replicon typically carries a transcription unit that directs transcription of a 
transgene or a fragment thereof to yield a plurality of RNA transcripts. 

A second consideration in designing the subject vector is to select two 

1 0 overlapping transcription units. By "overlapping" is meant that the two transcription 

units directs transcription of both DNA strands of the same transgene. to yield a 
plurality of partially or perfectly double stranded RNA transcripts. The two 
overlapping transcription units are typically arranged in an opposing orientation so 
that each unit can drive transcription of one of the complementary strands from the 

1 5 . same transgene, and thus facilitate the generation of double stranded RNA: 
transcripts. Elements within a transcription unit include but are not limited to 
promoter regions, enhancer regions, repressor binding regions, transcription initiation 
sites, ribosome binding sites, translation initiation sites, protein encoding regions and 
introns, and termination sites for transcription and translation. Preferred transcription 

20 units are arranged in a configuration shown in Figure 2(a)-(d). 

As used herein, a "promoter" is a DNA region capable under certain 
conditions of binding RNA polymerase and initiating transcription of a coding 
region located downstream (in the 3' direction) from the promoter. It can be 
constitutive or inducible. In general, the promoter sequence is bounded at its 3' 

25 terminus by the transcription initiation site and extends upstream (5* direction) to 

include the minimum number of bases or elements necessary to initiate transcription 
at levels detectable above background. Within the promoter sequence is a 
transcription initiation site, as well as protein binding domains responsible for the 
binding of RNA polymerase. Eukaryotic promoters will often, but not always, 

30 contain "TATA" boxes and "CAT" boxes. 

The choice of promoters will largely depend on the host cells in which the 
vector is introduced. Commonly employed plant promoters include but are not 
limited those from agrobacterium, nopaline synthase gene, octopine synthase gene, 
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mannopine synthase, rbcS (small subunit of ribulose bis-phosphate carboxylase). In 
addition, the promoter sequences may be provided by viral material. Any RNA 
virus subgenomic promoters described in Dawson et al. Advances in Virus 
Research , 38:307-342 and WO93/03161 can thus be employed. For animal cells, a 

5 variety of robust promoters, both viral and non-viral promoters, are known in the art. 

Non-limiting representative viral promoters include CMV, the early and late 
promoters of S V40 virus, promoters of various types of adenoviruses (e.g. 
adenovirus 2) and adeno-associated viruses. It is also possible, and often desirable, 
to utilize promoters normally associated with a desired transgene sequence, provided 

10 that such control sequences are compatible with the host cell system. See Goeddel 

et aL, Gene Expression Technology Methods in Enzymology Volume 185, 
Academic Press, San Diego, (1991), Ausubel et al, Protocols in Molecular Biology, 
Wiley Interscience (1994). 

Suitable promoter sequences for other eukaryotic cells such as yeast cells 

15... include the promoters for 3-phosphoglycerate kinase, or other glycolytic enzymes, 
such as enolase, glyceraldehyde-3-phosphate dehydrogenase, hexokinase, pyruvate 
decarboxylase, phosphofructokinase, glucose-6-phosphate isomerase, 3- 
phosphoglycerate mutase, pyruvate kinase, triosephosphate isomerase, 
phosphoglucose isomerase, and glucokinase. Other promoters, which have the 

20 additional advantage of transcription controlled by growth conditions, are the 

promoter regions for alcohol dehydrogenase 2, isocytochrome C, acid phosphatase, 
degradative enzymes associated with nitrogen metabolism, and the aforementioned 
glyceraldehyde-3-phosphate dehydrogenase, and enzymes responsible for maltose 

and galactose utilization. 

25 To optimize the yield of double-stranded RN As formed from the sense and 

anti-sense strands transcribed by the overlapping units, it is preferable to use two 
promoters of comparable strength. The relative strength of the promoters can be 
determined or ascertained by any convention recombinant techniques and methods 
exemplified herein. Representative techniques are Northern blot hybridization and 

30 DNA array-based technologies. An illustrative promoter pair comprises MSV mp 

promoter and CaMV 35S RNA promoter. 

Where desired, heterologous promoters that are removed from their native 
coding sequences and operatively linked to a transgene which it is not naturally 



13 



WO 01/77350 



PCT/US01/11436 



found linked, can be used in constructing the invention vectors. As such, any viral 
promoters described above can be used to drive the transcription of a non-viral 
transgenes; promoters of one class of genes can be employed to direct transcription 
of transgenes coding for other related or unrelated classes of proteins. In certain 

5 embodiments of the invention, it is preferable to employ inducible promoters to 

control the transcription of a transgene. A diverse variety of inducible promoters 
have been described in the art. Promoters of any endogenous genes whose 
expressions are inducible by internal or external factors can be employed. Factors 
applicable for transcription induction include but are not limited to hormones, heat 

10 shock, oxygen deficiency, light, stress and various chemicals. Commonly employed 

inducible promoters are P-gal promoter that is activated upon addition of IPTG; 
hps70 promoter that is inducible by heat shock; and ribulose-l,5-biphosphate 
carboxylase (RUBISCO) promoter that is regulated by light. 

Tissue-specific promoters may also be used. A vast diversity of tissue 

1 5 specific promoters have been described and employed by artisans in the field. 

Representative plant tissue promoters include that of legumin (or other seed storage 
protein promoters), patatin and the like. Exemplary promoters operative in selective 
animal tissue include hepatocyte-specific promoters and cardiac muscle specific 
promoters. Depending on the intended use of the subject vectors, those skilled in the 

20 art will know of other suitable tissue-specific promoters applicable for non- 

constitutive bi-directional transcription. 

In constructing the subject vectors, the termination sequences associated with 
the transgene are also inserted into the 3 1 end of the sequence desired to be 
transcribed to provide polyadenylation of the mRNA and/or transcriptional 

25 termination signal. The terminator sequence preferably contains one or more 

transcriptional termination sequences (such as polyadenylation sequences) and may 
also be lengthened by the inclusion of additional DNA sequence so as to further 
disrupt transcriptional read-through. Preferred terminator sequences (or termination 
sites) of the present invention have a gene that is followed by a transcription 

30 termination sequence, either its own termination sequence or a heterologous 

termination sequence. Examples of such termination sequences, including stop 
codons coupled to various polyadenylation sequences that are known in the art, 
widely available, and exemplified below. Where the terminator comprises a gene, it 
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can be advantageous to use a gene which encodes a detectable or selectable marker; 
thereby providing a means by which the presence and/or absence of the terminator 
sequence (and therefore the corresponding inactivation and/or activation of the 
transcription unit) can be detected and/or selected. Alternatively, a terminator may 
5 simply be a second promoter, arranged in inverted orientation to the promoter 

described above. 

The terminators and promoters of the two overlapping transcription units 
may take a variety of configurations. In one aspect, terminators 1 and 2 of the 
overlapping transcription units are arranged to immediately flank the transgene as 

10 shown in Figure 2(a). In another aspect, the two terminators are placed at the 5 ' end 

or the 3 ' end of their respective promoters as depicted in Figure 2(b). In other 
aspects, terminator 1 and promoter 1 are flanked by terminator 2 and promoter 2 as 
shown in Figure 2(c), or vice versa (see Figure 2(d)). Any other variations in 
configuring the two overlapping transcription units that permit bi-directional 

15 transcription are encompassed by the present invention. 

The transgene transcribed by an invention vector can be any gene expressed 
in a eukaryotic cell. The selection of transgene is determined largely by the intended 
purposf of the vector. Where the vector is used to inhibit expression of an 
endogenous gene present in a host cell, the transgene selected are substantially 

20 homologous to the target endogenous gene. In general, substantially homologous 

nucleotide sequences are at least about 60% identical with each other, after 
alignment of the homologous regions. Preferably, the sequences are at least about 
75% identical; more preferably, they are at least about 80% identical; more 
preferably, they are at least about 90% identical; still more preferably, the sequences 

25 are 95% identical. 

Sequence alignment and homology searches are often determined with the 
aid of computer methods. A variety of software programs are available in the art. 
Non-limiting examples of these programs are Blast 
(http://www.ncbi.nlm.nih.gov/BLAST/), Fasta (Genetics Computing Group 

30 package, Madison, Wisconsin), DNA Star, MegAlign, and GeneJocky. Any 

sequence databases that contains DNA sequences corresponding to a target gene or a 
segment thereof can be used for sequence analysis. Commonly employed databases 
include but are not limited to GenBank, EMBL, DDBJ, PDB, S WISS-PROT, EST, 
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STS, GSS, and HTGS. Sequence similarity can be discerned by aligning the 
transgene sequence against a target endogenous gene sequence. Common 
parameters for determining the extent of homology set forth by one or more of the 
aforementioned alignment programs include p value and percent sequence identity. 

5 P value is the probability that the alignment is produced by chance. For a single 

alignment, the p value can be calculated according to Karlin et al (1990) Prco.Natl. 
Acad. Sci 87: 2264. For multiple alignments, the p value can be calculated using a 
heuristic approach such as the one programmed in Blast. Percent sequence identity 
is defined by the ratio of the number of nucleotide matches between the query 

10 sequence and the known sequence when the two are optimally aligned. A selected 

transgene and target endogenous sequences are considered to be substantially 
homologous when the regions of alignment exhibit the aforementioned range of 
percentage of identity using Fasta or Blast alignment program with the default 
settings. 

15 Sequence homology can also be determined by functional analyses. A 

sequence that preserves the functionality of the nucleic acid with which it is being 
compared is particularly preferred. Functionality may be established by different 
criteria, such as ability to hybridize with a target polynucleotide, ability to 
effectively amplify a target sequence to yield a substantially homogenous 

20 multiplicity of products, and the ability to extend the 3 ' end sequence 

complementary to a target sequence in a nucleotide sequencing reaction. 

Where desired, the transgene may comprise heterologous sequences that 
facilitate detection of the expression and purification of the gene product. Examples 
of such sequences are known in the art and include those encoding reporter proteins. 

25 such as (1-galactosidase, p -lactamase, chloramphenicol acetyltransferase (CAT), 

luciferase, green fluorescent protein (GFP) and their derivatives. Other heterologous 
sequences that facilitate purification may code for epitopes such as Myc, HA 
(derived from influenza virus hemagglutinin), His-6, FLAG, glutathione S- 
transferase (GST), maltose-binding protein (MBP), or the Fc portion of 

30 immunoglobulin. 

The target endogenous genes whose expression is to be inhibited encompass 
native and heterologous genes present in the host cell. 4< Native" genes are nucleic 
acid sequences originated from the host cell. Non-limiting illustrative native genes 
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include those encode membrane proteins, cytosolic proteins, secreted proteins, 
nuclear proteins and chaperon proteins. Heterologous genes are sequences acquired 
exogenously by the host cell. Exogenous sequences can be either integrated into the 
host cell genome, or maintained as episomal sequences. An exemplary class of 
5 heterologous genes includes pathogenic genes derived from viruses, bacteria, fungi, 

and protozoa. 

The endogenous genes suitable for the present invention may also be 
characterized based on one or more of the following features: ability to induce a 
phenotypic change in a host cell or organism, species origin, developmental origin, 

10 primary structural similarity, involvement in a particular biological process, 

association with or resistance to a particular disease or disease stage, tissue, sub- 
tissue or cell-specific expression pattern, and subcellular location of the expressed 
gene product. In one aspect, the endogenous gene may be any gene expressed in a 
.. eukaryote cell, such as a plant cell, animal cell or a yeast cell. In another aspect, the 

15 endogenous gene confers a phenotypic characteristic detectable by visual, 

microscopic, genetic, or chemical means. Within this class of genes, of particular 
interest are plant genes involved in growth phenotypes, e.g. stunting, 
hyperbranching, vein banding, ring spot, etching, and those responsible for color 
characteristics including bleaching and chlorosis. Also, of particular relevance are 

20 genes which upon inhibition provide an enhanced resistance to pathogens (e.g. 

bacteria, fungi, viruses, insects, and protozoa), and resistance to adverse 
environmental factors (e.g. temperature fluctuation, nutritional deficiency, adverse 
soil conditions, moisture, dryness, etc.). 

In another aspect, the endogenous genes are of a specific developmental 

25 origin, such as those expressed in an embryo or an adult organism, during ectoderm, 

mesoderm, or endoderm formation in a multi-cellular animal, or during development 
of leaves, tubers, bud of a plant. In yet another aspect, the endogenous genes belong 
to a family of genes, or a sub-family of genes that share primary structural 
similarities. Structural similarities can be discerned with the aid of computer 

30 software described above. Non-limiting examples of gene families include those 

encoding proteinase, proteinase inhibitors, cell surface receptors, protein kinases 
(e.g. tyrosine, serine/threonine or histidine kinases), trimeric G-proteins, cytokines, 
PH-, SH2-, SH3-, PDZ-domain containing proteins, and any of those gene families 



17 



WO 01/77350 PCT/US01/11436 



published by the Institute for Genomic Research (TIGR), Incyte Pharmaceuticals, 
Inc., Human Genome Sciences Inc., Monsanto, and PE-Celera. 

In yet another aspect, the endogenous genes are involved in a specific 
biological process, including but not limited to cell cycle regulation, cell 

5 differentiation, chemotaxsis, apoptosis, cell motility and cytoskeletal rearrangement. 

In still another aspect, the endogenous genes embodied in the invention are 
associated with a particular disease or with a specific disease stage. Such genes 
include but are not limited to those associated with autoimmune diseases, obesity, 
hypertension, diabetes, neuronal and/or muscular degenerative diseases, cardiac 

10 diseases, endocrine disorders, any combinations thereof. In yet still another aspect, 

the endogenous genes encompass those exhibiting restricted expression patterns. 
Non-limiting exemplary gene transcripts of this class include those that are not 
ubiquitously expressed, but rather are differentially expressed in one or more of the 
plant tissues including leaf, seed, tuber, stems, root, and bud; or expressed in animal 
. 1 5 body tissues including heart, liver, prostate, lung, kidney, bone marrow, blood, skin, 

bladder, brain, muscles, nerves, and selected tissues that are affected by various 
types of cancer (malignant or non-metastatic), affected by cystic fibrosis or 
polycystic kidney disease. Additional examples of non-ubiquitously expressed 
genes are those whose gene products are localized to certain subcellular locations: 

20 extracellular matrix, nucleus, cytoplasm, cytoskeleton, plasma and/or intracellular 

membranous structures which include but are not limited to coated pits, Golgi 
apparatus, endoplasmic reticulum, endosome, lysosome, and mitochondria. 

In addition to the above-described elements, the vectors may contain a 
selectable marker (for example, a gene encoding a protein necessary for the survival 

25 or growth of a host cell transformed with the vector), although such a marker gene 

can be carried on another polynucleotide sequence co-introduced into the host cell. 
Only those host cells into which a selectable gene has been introduced will survive 
and/or grow under selective conditions. Typical selection genes encode protein(s) 
that (a) confer resistance to antibiotics or other toxins substances, e.g., ampicillin, 

30 neomycyin, methotrexate, etc.; (b) complement auxotrophic deficiencies; or (c) 

supply critical nutrients not available fiom complex media. The choice of the proper 
marker gene will depend on the host cell, and appropriate genes for different hosts 
are known in the art 
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The vectors embodied in this invention can be obtained using recombinant 
cloning methods and/or by chemical synthesis. A vast number of recombinant 
cloning techniques such as PCR, restriction endonuclease digestion and ligation are 
well known in the art, and need not be described in detail herein. One of skill in the 
5 art can also use the sequence data provided herein or that in the public or proprietary 

databases to obtain a desired vector by any synthetic means available in the art. 

Host cell and transgenic organisms of the pr esent invention: 

1 o The invention provides eukaiyotic host cells transformed with the 

recombinant DNA vectors described above. The recombinant vectors containing the 
transgene of interest can be introduced into a suitable eukaryotic cell by any of a 
number of appropriate means, including electroporation, transfection employing 
calcium chloride, rubidium chloride, calcium phosphate, DEAE-dextran, or other 

15 substances; microprojectile bombardment; lipofection; and infection (where the 

vector is coupled to an infectious agent). The choice of introducing vectors will 
often depend on features of the host cell. 

For most animal cells, any of the above-mentioned methods is suitable for 
vector delivery. For plant cells, a variety of techniques derived from these general 

20 methods is available in the art. The host cells may be in the form of whole plants, 

isolated cells or protoplasts. Preferably, the cells are "intact" in that the cell 
comprises an outer layer of cell wall, typically composed of cellulose for protection 
and maintaining the rigidity of the plant cell. Illustrative procedures for introducing 
vectors into plant cells include Agrobacterium-mediated plant transformation, 

25 protoplast transformation, gene transfer into pollen, injection into reproductive 

organs and injection into immature embryos. As is evident to one skilled in the art, 
each of these methods has distinct advantages and disadvantages. Thus, one 
particular method of introducing genes into a particular plant species may not 
necessarily be the most effective for another plant species. 

30 Agrobacterium tiimefaciens-mediated transfer is a widely applicable system 

for introducing genes into plant cells because the DNA can be introduced into whole 
plant tissues, bypassing the need for regeneration of an intact plant from a 
protoplast. The use of Agrobacterium-mediated expression vectors to introduce 
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DNA into plant cells is well known in the art. This technique makes use of a 
common feature of Agrobacterium which colonizes plants by transferring a portion 
of their DNA (the T-DNA) into a host cell, where it becomes integrated into nuclear 
DNA. The T-DNA is defined by border sequences which are 25 base pairs long, and 

5 any DNA between these border sequences is transferred to the plant cells as well. 

The insertion of a recombinant plant viral nucleic acid between the T-DNA border 
sequences results in transfer of the recombinant plant viral nucleic acid to the plant 
cells, where the recombinant plant viral nucleic acid is replicated, and then spreads 
systemically through the plant. Agro-infection has been accomplished with potato 

10 spindle tuber viroid (PSTV); CaV; and Lazarowitz, S, Nucl Acids Res. 16:229 

(1988)) digitaria streak virus ponson et al t Virology 162:248 (1988)), wheat dwarf 
and tomato golden mosaic virus (TGMV). Therefore, agro-infection of a susceptible 
plant could be accomplished with a virion containing a recombinant plant viral 
nucleic acid based on the nucleotide sequence of any of the above viruses. Particle 

15 . bombardment or electrosporation or any other methods known in the art may also be 
used. 

Because not all plants are natural hosts for Agrobacterium, alternative 
methods such as transformation of protoplasts may be employed to introduce the 
subject vectors into the host cells. For certain monocots, transformation of the plant 

20 protoplasts can be achieved using methods based on calcium phosphate 

precipitation, polyethylene glycol treatment, electroporation, and combinations of 
these treatments. See, for example, Potrykus et al, Mol Gen. Genet., 199:167-177 
(1985); Fromm et al., Nature, 319:791 (1986); Callis et al. Genes and Development, 
1:1 183 (1987). Applicability of these techniques to different plant species may 

25 depend upon the feasibility to regenerate that particular plant species from 

protoplasts. 

In addition to protoplast transformation, particle bombardment is an 
alternative and convenient technique for delivering the invention vectors into a plant 
host cell. Specifically, the plant cells may be bombarded with microparticles coated 
30 with a plurality of the subject vectors. Bombardment with DNA-coated 

microprojectiles has been successfully used to produce stable transformants in both 
plants and animals (see, for example, Sanford et al. (1993) Methods in Enzyrnology, 
217:483-509). Microparticles suitable for introducing vectors into a plant cell are 
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typically made of metal, preferably tungsten or gold. These microparticles are 
available for example, from BioRad (e.g., Bio-Rad's PDS-1000/He). Those skilled 
in the art will know that the particle bombardment protocol can be optimized for any 
plant by varying parameters such as He pressure, quantity of coated particles, 

5 distance between the macrocarrier and the stopping screen and flying distance from 

the stopping screen to the target. 

Vectors can also be introduced into plants by direct DNA transfer into pollen 
as described by Zhou et al., Methods in Enzymology, 101:433 (1983); Luo et al., 
Plant Mol Biol Reporter, 6:165 (1988). Alternatively, the vectors can be injected 

10 into reproductive organs of a plant as described by Pena et al., Nature, 325:274 

(1987). ' 

Other techniques for introducing nucleic acids into a plant cell include: 

(a) Hand Inoculations. Hand inoculations are performed using a neutral pH, low 
molarity phosphate buffer, with the addition of celite or carborundum 

15 . . (usually about 1%). One to four drops of the preparation is put onto the 

upper surface of a leaf and gently rubbed. 

(b) Mechanized Inoculations of Plant Beds. Plant bed inoculations are 
performed by spraying (gas-propelled) the vector solution into a tractor- 
driven mower while cutting the leaves. Alternatively, the plant bed is 

20 mowed and the vector solution sprayed immediately onto the cut leaves. 

(c) High Pressure Spray of Single Leaves. Single plant inoculations can also be 
performed by spraying the leaves with a narrow, directed spray (50 psi, 6-12 
inches from the leaf) containing approximately 1% carborundum in the 
buffered vector solution. 

25 (d) Vacuum Infiltration. Inoculations may be accomplished by subjecting a host 

organism to a substantially vacuum pressure environment in order to 
facilitate infection. 

Once introduced into a suitable host cell, expression of the transgene can be 
30 determined using any assay known in the art. For example, the presence of 

transcribed sense or anti-sense strands of the transgene can be detected and/or 
quantified by conventional hybridization assays (e.g. Northern blot analysis), 
amplification procedures (e.g. RT-PCR), SAGE (U.S. Patent No. 5,695,937), and 
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array-based technologies (see e.g. U.S. Pat. Nos. 5,405,783, 5,412,087 and 
5,445,934). In conducting these analytical procedures, it is preferable to induce 
transcription of one strand of the transgene at a time. As is apparent to one skilled in 
the art, the simultaneous transcription of both sense and anti-sense strands facilitates 
formation of double stranded RNA molecules, which may obscure the accurate 
determination of the levels of sense and anti-sense RNA transcripts. 

Expression of the transgene can also be determined by examining the protein 
product. A variety of techniques are available in the art for protein analysis. They 
include but are not limited to radioimmunoassays, ELISA (enzyme linked 
immunoradiometric assays), "sandwich" immunoassays, immunoradiometric assays, 
in situ immunoassays (using e.g., colloidal gold, enzyme or radioisotope labels), 
western blot analysis, immunoprecipitation assays, immunoflourescent assays, and 
PAGE-SDS. 

In general, determining the protein level involves (a) providing a biological 
sample containing polypeptides; and (b) measuring the amount of any 
immunospecific binding that occurs between an antibody reactive to the transgene 
product and a component in the sample, in which the amount of immunospecific 
binding indicates the level of expressed proteins. Antibodies that specifically 
recognize and bind to the protein products of the transgene are required for 
immunoassays. These may be purchased from commercial vendors or generated and 
screened using methods well known in the art. See Harlow and Lane (1988) supra. 
and Sambrook et al. (1989) supra. The sample of test proteins can be prepared by 
homogenizing the eukaryotic transformants (e.g. plant cells) or their progenies made 
therefrom, and optionally solubilizing the test protein using detergents, preferably 
non-reducing detergents such as triton and digitomn. The binding reaction in which 
the test proteins are allowed to interact with the detecting antibodies may be 
performed in solution, or on a solid tissue sample, for example, using tissue sections 
or solid support that has been immobilized with the test proteins. The formation of 
the complex can be detected by a number of techniques known in the art. For 
example, the antibodies may be supplied with a label and unreacted antibodies may 
be removed from the complex; the amount of remaining label thereby indicating the 
amount of complex formed. Results obtained using any such assay on a sample 
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from a plant transformant or a progeny thereof is compared with those from a 
non-transformed source as a control. 

The eukaryotic host cells of this invention are grown under favorable 
conditions to effect transcription of the transgene. Non-limiting examples of 

5 eukaryotic hosts are fungus, yeast, plant cells, insect, avian, mammalian or other 

animal cells. The host cells can be used, inter alia, as repositories of the transgene 
and/or vehicles for production of the transgene-specific double stranded RNAs. The 
host cells may also be employed to generate transgenic organisms such as transgenic 
animals and plants comprising the recombinant DNA vectors of the present 

10 invention. Preferred host cells are those having the propensity to regenerate into 

tissue or a whole organisms. Examples of these preferred host celts are oocytes, 
blastocytes, and certain plant cells exemplified herein. 

. . , Accordingly, this invention provides transgenic plants carrying the subject 

15 ^ vectors: In a preferred embodiment, the trangenic plant exhibits a reduced. 

expression (when compared to a control plant) of an endogenous gene that is 
substantially homologous to the transgene carried in the subject vector. 

The regeneration of plants from either single plant protoplasts or various 
explants is well known in the art. See, for example, Methods for Plant Molecular 

20 Biology, Mary A. Shuler and Raymond E. Zielinski, Academic Press, Inc., San 

Diego, Calif. (1988). This regeneration and growth process includes the steps of 
selection of transformant cells and shoots, rooting the transformant shoots and 
growth of the plantlets in soil. 

The regeneration of plants containing the subject vector introduced by 

25 Agrobacterium tumefaciens from leaf explants can be achieved as described by 

Horsch et al., Science, 227:1229-1231 (1985). In this procedure, transformants are 
grown in the presence of a selection agent and in a medium that induces the 
regeneration of shoots in the plant species being transformed as described by Fraley 
et al., Proc. Natl Acad. Set U.S.A., 80:4803 (1983), This procedure typically 

30 produces shoots within two to four weeks and these transformant shoots are then 

transferred to an appropriate root-inducing medium containing the selective agent 
and an antibiotic to prevent bacterial growth. Transformant shoots that rooted in the 
presence of the selective agent to form plantlets are then transplanted to soil to allow 
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the production of roots. These procedures will vary depending upon the particular 
plant species employed, as is apparent to one of ordinary skill in the art. 

A population of progeny can be produced from the first and second 
transformants of a plant species by methods well known in the art including cross 
5 fertilization and asexual reproduction. Transgenic plants embodied in the present 

invention are useful for production of desired proteins, and as test systems for 
analysis of the biological functions of a gene. 

Uses of the vectors of the present invention: 

1 o The subject vectors provide specific reagents for inhibiting expression of an 

endogenous gene present in a host cell. The expression inhibition methods may be 
used in a wide variety of circumstances including suppression of a gene associated 
with a particular disease or disease stage; delineating the biological functions of a 
gene by analyzing a phenotypic change in the host cell that correlates with the 

1 5 selective suppression of gene expression; andiaciiitating.drug screening by 

rendering the host cell more susceptible or resistant to a therapeutic agent of interest. 

Accordingly, this invention provides a method^of inhibiting expression of an 
endogenous gene present in a eukaryotic cell The method comprises the steps of: 
(a) providing a subject vector containing a transgene that is substantially 

20 homologous to an endogenous gene of a eukaryotic cell; (b) introducing the 

recombinant vector into the eukaryotic cell; (c) culturing the eukaryotic cell of (b) 
under conditions favorable for expression of both sense and antisense RNA 
transcripts from the transgene, and thereby inhibiting expression of the 
corresponding endogenous gene in the eukaryotic cell 

25 In a separate embodiment, the invention provides a method of identifying a 

biological function(s) of an endogenous gene of interest in a eukaryotic cell by 
selectively inhibiting the expression of the endogenous gene. The method involves: 
(a) providing a recombinant vector of the present invention, wherein the transgene 
contained in the vector is substantially homologous to the endogenous gene; (b) 

30 introducing the recombinant vector of (a) into the eukaryotic cell; (c) culturing the 

eukaryotic cell of (b) under conditions favorable for expression of both sense and 
antisense RNA transcripts from the transgene contained in the recombinant vector 
and thereby inhibiting expression of the endogenous gene in the eukaryotic celt; and 
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(d) determining one or more phenotypic changes in the eukaryotic cell that correlate 
with the inhibited expression of the endogenous gene, thereby identifying the 
biological fanction(s) of the endogenous gene in the eukaryotic cell. 

The host cells encompassed by these embodiments are eukaryotic cells 
5 susceptible to dsRNA-mediated "genetic interference". dsRNA induced gene 

silencing has been observed in a variety of multi-cellular organisms including but 
not limited to worms, fruitflies, protozoa, fungi, mammals, and zebrafish. Thus, 
cells from any of these exemplary organisms can be employed. Suitable host cells 
may be derived from primary cultures or subcultures generated by expansion and/or 
10 cloning of primary cultures. Any cells capable of growth in culture can be used as 

host cells. Of particular interest is the type of cell that differentially expresses (over- 
expresses or under-expresses) a disease-causing gene. As is apparent to one skilled 
in the art, various cell lines may be obtained from public or private repositories. The 
largest depository agent is American Type Culture Collection (http://www.atcc.org), 
15 which offers a diverse collection of well-characterized cell lines derived from a vast 

number of organisms and tissue samples. 

Upon delivery of the subject vectors, the host cells are cultured under 
conditions favorable for gene transcription. The parameters governing eukaryotic 
cell survival are generally applicable for induction of gene transcription. The culture 
20 conditions are well established in the art. Physicochemical parameters which may 

be controlled in vitro are, e.g., pH, C0 2 , temperature, and osmolality. The 
nutritional requirements of cells are usually provided in standard media formulations 
developed to provide an optimal environment. Nutrients can be divided into several 
categories: amino acids and their derivatives, carbohydrates, sugars, fatty acids, 
25 complex lipids, nucleic acid derivatives and vitamins. Apart from nutrients for 

maintaining cell metabolism, most cells also require one or more hormones from at 
least one of the following groups: steroids, prostaglandins, growth factors, pituitary 
hormones, and peptide hormones to survive or proliferate (Sato, G.H., et al. in 
"Growth of Cells in Hormonally Defined Media", Cold Spring Harbor Press, N.Y., 
30 1982; Barnes and Sato (1980) Anal Biochem. y 1 02:255. Given the vast wealth of 

information on the nutrient requirements, medium conditions optimized for cell 
survival, one skilled in the art can readily fashion various culture conditions using 
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any one of the aforementioned methods and compositions, alone or in any 
combination. 

The inhibition of expression of the endogenous gene sharing substantial 
sequence homology with the transgene carried in the vectors can be determined by 
5 assaying for a difference, between the host cell and the control cell, in the level of 

mRNA transcripts of the endogenous gene. Alternatively, a suppression in 
expression is determined by detecting a difference in the level of the polypeptide(s) 
encoded by the endogenous gene. A preferred method is to detect a phenotypic 
change resulting from the decrease in expression of the endogenous gene of interest. 
10 In assaying for an alteration in mRNA level, nucleic acid contained in the 

host cells is first extracted according to standard methods in the art. For instance, 
mRNA can be isolated using various lytic enzymes or chemical solutions according 
to the procedures set forth in Sambrook et al. (1989), supra or extracted by nucleic- 
acid-binding resins following the accompanying instructions provided by 
15.- - . manufacturers. The mRNA contained in the extracted nucleic acid sample is then 
detected by hybridization (e.g. Northern blot analysis) and/or amplification 
procedures according to methods widely known in the art or based on the methods 
exemplified herein. 

Reduction in expression of the endogenous gene can also be determined by 
20 examining the protein product of the endogenous gene. A variety of techniques is 

available in the art for protein analysis. They include but are not limited to 
radioimmunoassays, ELISA (enzyme linked immunoradiometric assays), 
"sandwich" immunoassays, immunoradiometric assays, in situ immunoassays (using 
colloidal gold, enzyme or radioisotope labels), western blot analysis, , 
25 immunoprecipitation assays, immunoflourescent assays, and SDS-PAGE. In 

addition, cell sorting analysis can be employed to detect cell surface antigens. Such 
analysis involves labeling target cells with antibodies coupled to a detectable agent, 
and then separating the labeled cells from the unlabeled ones in a cell sorter. A 
sophisticated cell separation method is fluorescence-activated cell sorting (FACS). 
30 Cells traveling in single file in a fine stream are passed through a laser beam, and the 

fluorescence of each cell bound by the fluorescently labeled antibodies is then 
measured. 
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Antibodies that specifically recognize and bind to the protein products of 
interest are required for conducting the aforementioned protein analyses. These 
antibodies may be purchased from commercial vendors or generated and screened 
using methods well known in the art. See Harlow and Lane (1988) supra, and 

5 Sambrook et al. (1989) supra. 

Inhibition of gene expression can also result in phenotypic change(s) in a 
host cell. As used herein, phenotypic change refers to any non-genotypic change 
that can be detected visually, or analyzed biochemically or genetically. The choice 
of detection methods will largely depend on the nature of the phenotypic 

1 0 characteristics that are under investigation. For instance, certain phenotypic features 

of a plant cell can be detected microscopically or macroscopically. These features 
include improved tolerance to herbicides, improved tolerance to extremes of heat or 
cold, drought, salinity or osmotic stress; improved resistance to pests (insects, 
nematodes or arachnids) or diseases (fungal, bacterial or viral), production of 

1 5 enzymes or secondary metabolites; male or female sterility; dwarfoess; early 

maturity; improved yield, vigor, heterosis, nutritional qualities, flavor or processing 
properties, and the like. Other detectable phenotypic changes are morphological, 
alterations including but not limited to stunting, hypeibranching, vein banding, ring 
spot, etching, and those responsible for color characteristics including bleaching and 

20 chlorosis. 

For animal cells, detectable phenotypic changes may encompass alterations 
in cell cycle regulation, cell differentiation, apoptosis, chemotaxsis, cell motility and 
cytoskeletal rearrangement. Methods for detecting these phenotypic changes are 
well-established in the art and hence are not detailed herein. 

25 Other phenotypic changes commonly observed in both plant and animal cells 

involve differential expression (over-expression or under-expression) of a particular 
protein due to the selective inhibition of the endogenous gene of interest. 
Differential gene expression may be analyzed by any chemical means available in 
the art or those disclosed herein. As is also apparent to artisans, altering expression 

30 of one endogenous gene may lead to changes in gene expression profile of a host of 

genes mapped to the same or related signal transduction pathways. As used herein, 
"signal transduction" refers to the process by which stimulatory or inhibitory signals 
are transmitted into and within a cell to elicit an intracellular response. Any 
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fluctuation in intracellular response of a eukaryotic host cell is also considered as a 
type of phenotypic change. 

Alteration in intracellular response is often determined with the aid of 
reporter molecules. For example, when examining a signaling cascade involving a 
5 fluctuation of intracellular pH condition, pH sensitive molecules such as fluorescent 

pH dyes can be used as the reporter molecules. In another example where the 
signaling pathway of a trimeric G q protein is analyzed, calcium-sensitive fluorescent 
probes can be employed as reporters. As is apparent to artisans in the field of signal 
transduction, trimeric G q protein is involved in a classic signaling pathway, in which 
10 activation of G q stimulates hydrolysis of phosphoinositides by phospholipase C to 

generate two classes of well-characterized second messengers, namely, 
diacylglycerol and inositol phosphates. The latter stimulates the mobilization of 
calcium from intracellular stores, and thus resulting in a transient surge of 
intracellular calcium concentration, which is a readout measurable with a calcium- 
15 sensitive probe. 

Another exemplary class of reporter molecules is a reporter gene operably 
linked to an inducible promoter that can be activated upon the stimulation or 
inhibition of a signaling pathway. Reporter proteins can also be linked with other 
proteins whose expression is dependent upon the stimulation or suppression of a 
20 given signaling cascade. Commonly employed reporter proteins can be easily 

detected by a colorimetric or fluorescent assay. Non-limiting examples of such 
reporter proteins include : p-galactosidase, p -lactamase, chloramphenicol 
acetyltransferase (CAT), luciferase, green fluorescent protein (GFP) and their 
derivatives. Those skilled in the art will know of other suitable reporter molecules 
25 for assaying changes in a specific signaling transduction readout, or will be able to 

ascertain such, using routine experimentation. 

To discern inhibition of gene expression, one typically conducts a 
comparative analysis of the subject and appropriate controls. Preferably, a test 
includes a positive control sample exhibiting a decrease in gene expression and a 
30 negative control having an unaltered expression level. The selection of an 

appropriate control cell or tissue is dependent on the sample cell or tissue initially 
selected and its phenotype which is under investigation. 
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In one aspect, the invention methods can be employed to selectively inhibit 
expression of an endogenous gene that is native to the eukaryotic host cell. Such a 
gene may encode encodes a protein selected from the group consisting of a 
membrane protein, a cytosolic protein, a secreted protein, a nuclear protein and a 

5 chaperon protein. Of particular interests are endogenous genes that confer 

phenotypic changes as a result of inhibition of the expression and/or function of the 
endogenous genes. In another aspect within this embodiment, the endogenous gene 
is heterologous to the host cell. As used herein, heterologous genes are acquired 
exogenously by the host cell. Non-limiting examples of heterologous genes are 

10 those derived from virus, bacterium, fungus, and protozoa. 

In a separate embodiment, the invention methods are used to identify a 
biological function(s) of an endogenous gene in a eukaryotic ceil by examining a 
phenotypic change associated with the inhibition in its expression and thus loss of 
biological function. In essence, the subject methods allow the creation of a transient 

15 or more long-term gene-specific knock-out system for analyzing the biological 

function of any endogenous gene of interest. 

Kits comprising the vectors of the present invention 

The present invention also encompasses kits containing the vectors of this 
20 invention in suitable packaging. Kits embodied by this invention include those that 

allow generation of a double-stranded RNA transcript in a eukaryotic cell. 

Each kit necessarily comprises the reagents which render the delivery of 
vectors into a eukaryotic host cell possible. The selection of reagents that facilitate 
delivery of the vectors may vary depending on the particular transfection or 
25 infection method used. The kits may also contain reagents useful for generating 

labeled polynucleotide probes or proteinaceous probes for detection of gene 
silencing. Each reagent can be supplied in a solid form or dissolved/suspended in a 
liquid buffer suitable for inventory storage, and later for exchange or addition into 
the reaction medium when the experiment is performed. Suitable packaging is 
30 provided. The kit can optionally provide additional components that are useful in 

the procedure. These optional components include, but are not limited to, buffers, 
capture reagents, developing reagents, labels, reacting surfaces, means for detection, 
control samples, instructions, and interpretive information. The kits can be 
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employed to generate eukaryotic cells whose endogenous genes are selectively 
inhibited, and transgenic organisms comprising these eukaryotic cells. 

Further illustration of the development and use of vectors and assays 
according to this invention are provided in the Example section below. The 
5 examples are provided as a guide to a practitioner of ordinary skill in the art, and 

not meant to be limiting in any way. 
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EXAMPLES 

Example 1: Construction of recombinant vectors com prising two opposing 
transcription units 

5 

We have designed a recombinant vector construct useful for silencing 
nuclear genes in many of the agriculturally-important cereal crops. The vector 
comprises sequences derived from maize streak geminivirus, isolated MSV-Kom 
(genbank accession number £F003252, classification: Family Geminiviridae, genus 

10 Mastrevirus, species maize streak virus, designated MSV-Komatipoort. Maize 

streak virus has a broad host range that encompasses all agriculturally important 
cereal crops, including but not limited to corn, wheat, rice, barley, rye, sorghum and 
millet. The methods for construction of infectious geminiviruses are well known to 
those skilled in the art, and are described in European patent application 8687015.5 

15 as well as in US Patent No. 5,569,597. 

We have synthesized a 161 8 base pair synthetic DNA that contains the 
MSV-Kom repA and repB, long intergenic region (LER) and short intergenic region 
(SIR) and thus all sequences that are required for viral replication. Palmer et 
al(1999; Archives of Virology 144: 1345-1360. This fragment was cloned into the 

20 pZeRO-2 vector (Invitrogen) as an EcoBl-Xbal fragment, to create the plasmid 

pMSVLSB-1, the sequence of which is shown in Figure 4. A 171 base pair 
fragment containing the movement protein (mp) promoter of MSV-Kom is 
synthesised and cloned into the pZeRO-2 vector as an HindM-EcoKl fragment to 
create pMSVLSB-2 (sequence shown in Figure 5). The Apal fragment containing 

25 the mp promoter is inserted between the two Apal sites in pMSVLSB-1 , to create 

pMSVLSB-3 (sequence shown in Figure 6). 

The cauliflower mosaic virus 35S RNA promoter (CaMV 35S promoter) 
sequence is amplified with a vector containing this sequence (pBI121, from 
30 Clontech) as template DNA, using the following PCR primers containing the 

following restriction sites (shown in italicized): EcoKL in CaMV35SF and Sail in 
CaMV35SR. 

CaMV35SF: 
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TTTGAA 7TCGTCAAC ATGGTGGAGCAC (SEQ ID NO: 1 ) 



CaMV35SR: 

TTTGTCGA CGTCCTCTCC AAATGAAATG AAC (SEQ ID NO:2) 

5 

The CaMV 35S promoter PCR product yielded is digested with EcoRl and 
Sail and the restricted fragments are purified. 

The zeocin resistance gene is amplified by PCR with the vector pZeROl 
10 (Invitrogen) as template, using the following primers containing the following 

restriction sites shown in italicized: Sail, Pad and Notl in ZeoF and Xhol, Pad and 
Notl in ZeoR: 



ZeoF; 

15 CCCGTCGA CTTAA TTAA GCGGCCGCGTTT AC AATTTCGCCTGATGC 

(SEQ1DN0:3) 

ZeoR: 

CCCCTCGA GTTAA TTAA GCGGCCGCCTC AAAAAGG ATCTTC ACCT A 
20 G (SEQIDNO:4) 

The zeocin resistance gene product yielded is digested with Xhol and Sail 
and purified. 

25 The nopaline synthase (nos) terminator sequence is amplified by PCR with 

the vector pBI121 (Clontech) as template, using the following primers, with 
restriction sites Xhol in nosF and Spel in nosR italicized: 

NosF: 

30 TTTCrCG^GCGAATTTCCCCGATCGTTCAAAC (SEQ ID NO:5) 

NosR: 

TTL4 CTA G7CCCG ATCT AGT AAC AT AG ATGAC (SEQ ID NO:6) 
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The nos terminator product yielded is digested with Xhol and Spel and 
purified. 

5 The digested CaMV35S promoter, zeocin resistance gene and nos terminator 

sequences are ligated together with T4 DNA ligase. The ligated product is diluted 
1:100 in sterile water and the whole ligation product is re-amplified with the 
CaMV35SF and nosR primers. The resulting PCR product is digested with EcoRI 
and Spel purified and ligated with pMSVLSB-3 that is pre-digested with EcoRJ and 

10 Spel. The ligation reaction is used to transform E. coli competent cells. 

Transfonnants are selected on Luria Agar plates containing both kanamycin (100 
Hg/ml) and zeocin (50 ng/ml) to select for colonies containing the CaMV35S 
promoter-zeocin resistance gene-nos terminator cassette inserted into pMSVLSB-3 
(Figure 6 and SEQ ID NO:l 1). Colonies putatively containing the correct plasmid 

15 are chosen, plasmid DNA isolated and screened by digestion with Ecd91 and Spel. 

One plasmid designated pMSVLSB-4 (Figure 7 and SEQ ID NO: 12) is selected. 

One of the methods in the art of construction of infectious clones of 
gemini virus genomes is to clone tandemly duplicated sequences of the gemini virus 
genome, with at least the LIR duplicated. This allows the virus sequence to escape 

20 from the cloning vector in planta by a replicative release mechanism. The virus Rep 

protein is transiently expressed in transfected cells, and induces a nick at each of the 
stem loop sequences contained within the origin of replication in the LIR. Rolling 
circle replication is initiated at each nick point, and this results in release of a 
ssDNA copy of the virus replicon, which is circularized by the Rep protein, and 

25 which then replicates autonomously in the plant cell nucleus. The XbaVSpel 

fragment from pMSVLSB-3, containing the viral LIR and Rep genes is inserted into 
the unique Spel site in pMS VLSB-4 to create pMS VLSB-5 (Figure 8 and SEQ ID 
NO: 13). The zeocin resistance gene is deleted by digestion with Notl\ the DNA is 
recircularized and used to transform E.coli to kanamycin resistance with a new 

30 vector, pMSVLSB-6 (Figure 9 and SEQ ID NO: 1 4). When the vector is introduced 

into plant cells, a monomeric copy of the insert is released by replicative release 
(described above) and replicates autonomously as construct MSVLSB-6 in the 
nuclei of infected cells. 
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The restriction map of construct MSVLSB-6 is shown in Figure 3; this 
genetic construct possesses the following features: (a) the rep genes and origins of 
replication from maize streak geminivirus that are necessary and sufficient for the 
autonomous replication of the viral construct and its associated foreign DNA in the 

5 host plant cell; (b) two overlapping transcription units present in the DNA replicon. 

The two overlapping transcription units are arranged according to the configuration 
shown in Figure 2. With reference to Figure 2, "promoter 1" and 'terminator 1" in 
MSVLSB-6 are the MSV mp promoter and transcription termination signals present 
in the SIR, respectively, and "promoter 2" and "terminator 2" are the CaMV 35S 

10 RNA promoter and nos terminator sequences, respectively. The two overlapping 

transcription units share three unique restriction sites (Sail, Pad and Not!) and one 
non-unique restriction site (Xhol) where foreign DNA may be inserted so that it may 
be transcribed by both promoters to yield at least a partially double stranded RNA 
duplex of the foreign DNA sequence. 

15 

Rvample 2: Use of recombinant vectors to inhib it or silence gene expression 

in cereal crops: 

Application ofpMSVLSB-6 in inhibition ofDwarfl gene expression in rice 

20 

The vector pMSVLSB-6 exemplified above can be employed to inhibit 
expression of any endogenous gene in a variety of plant host cells. By way of 
illustration, the rice gene Dwarf 1 is inhibited to duplicate known mutant phenotype 
using a pMS VLSB-6 containing a fragment of the coding sequence of Dwarf 1 - 

25 (Genbank accession number AB028602). The gene is amplified from cDNA 

isolated fix>m rice seedlings. Primer sequences are designed to have homology with 
the published sequence of Dwarfl. Ashikari et al (1999) PNAS U.S.A. 96:10284- 
10289. The primer sequences contain Noil restriction sites at their 5' ends. The 
PCR product is digested with Notl and cloned into the Notl site of pMS VLSB-6 to 

30 generate pMSVLSB-6::dwarfl s and pMS VLSB-6: :dwarfl a, with the insert cloned 

in the sense and antisense orientation with respect to the MSV mp promoter, 
respectively. The Xbal-Spel fragment from each of these plasmids is transferred 
into mAgrobacterium binary vector that is commonly used for rice transformation. 
This vector is used to transform electrocompetent Agrobacterium strain LBA4404 
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i 

(Life Technologies). Agrobacterium cultures containing the appropriate plasmids 
are used in transformation of rice. Transgenic rice is generated by standard 
protocols (see, e.g. US Patent 5,591,616). The transgenic rice plants display similar 
phenotypes to the dwarfl mutant described by Ashikari et al (1999) supra: they are 
5 giberellin-insensitve, dwarfed in comparison with un-silenced transgenic controls, 

and having broad, dark green leaves, compact pannicles and short, round grains. 

Application ofpMSVLSB-6 in inhibition ofphytoene desaturase expression 
in maize seedlings 

10 . 

The coding sequence for the maize phytoene desaturase gene (pds), having 
the Genbank accession number U37285, is amplified from cDNA made from RNA 
isolated from four-day-old maize seedlings, of the cultivar "Golden Cross Bantam". 
The primers used for amplification of this cDNA have the following sequences 
1 5 containing the Pad sites (italicized) at the 5 ' ends: 

zeapdsl330: 

TTTTTAA 3X4 AGGTCCGCCTGAATTCTCG (SEQ ID NO:7) 

20 zeapdsl873 

TTT7T/W7T^4CGGCAAGGCTCACAGTTTG (SEQ ID NO: 8) 

PCR amplification with these primers and cDNA made from RNA isolated 
from maize seedlings yields a product of 565 base pairs, which is then digested with 

25 . Pad. The progenitor plasmid to pMSVLSB-6, pMS VLSB-5 is digested with Xbal 
and Spel to release the MSV and associated overlapping transcription unit sequences 
from the pZeRO-2 cloning vector as a single 4816 base pair fragment. This 
fragment is cloned into the Agrobacterium binary vector pBin!9 (Genbank: 
U09365) digested with Xbal to yield pMSVLSB-7. The plasmid pMS VLSB-7 is 

30 digested with Pad and the pds PCR fragment is inserted into this position, 

generating plasmid pMSVLSB-7::/wfcl (cloned in the sense orientation with respect 
to the MSV mp promoter) and pMSVLSB-7::/w£s2 (cloned in the antisense 
orientation with respect to the MSV mp promoter. These two plasmids are each 
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introduced into Agrobacterium strain C58Cl(pMP90) (Koncz and Schell, 1985) by 
electroporation. The Agrobacterium containing the binary vector plasmids is grown 
overnight in Luria Bertani medium containing appropriate selective antibiotics. The 
bacterial suspension is loaded into a 100 n.l Hamilton syringe and injected into three 
5 day old maize seedlings (cultivar Golden Cross Bantam) according to methods 

described by Escudero et al. (1 994) in the chapter "Agroinfection" of The Maize 
Handbook, Freelings M, Walbot V (eds). Plants that are successfully agroinfected 
display a photobleaching phenotype on the first three leaves, similar to that induced 
by spraying the plants with the phytoene desaturase-inhibitor norfluorazon. 
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CLAIMS 

What is claimed is: 

1 . A eukaiyotic recombinant vector comprising a viral replicon having two 
5 overlapping transcription units arranged in an opposing orientation and flanking a 

transgene of interest, wherein the two overlapping transcription units yield both 
sense and antisense RNA transcripts from the same transgene in a eukaryotic host 
cell. 

10 2 . The eukaryotic recombinant vector of claim 1 , wherein each of the 

overlapping transcription units comprises a promoter and a terminator. 

3. The eukaryotic recombinant vector of claim 2, wherein the promoter is a 
constitutive promoter. 

15 

4. The eukaryotic recombinant vector of claim 2, wherein the promoter is 
an inducible promoter. 

5. The eukaryotic recombinant vector of claim 2, wherein the promoter is a 
20 tissue-specific promoter. 

6. The eukaryotic recombinant vector of claim 1 , wherein the promoter and 
the terminator of the overlapping transcription units are arranged in a configuration 
shown in Figure 2(a). 

25 

7. The eukaryotic recombinant vector of claim 1, wherein the promoter and 
the terminator of the overlapping transcription units are arranged in a configuration 
shown in Figure 2(b). 

30 8. The eukaryotic recombinant vector of claim 1, wherein the promoter and 

the terminator of the overlapping transcription units are arranged in a configuration 
shown in Figure 2(c). 
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9. The eukaryotic recombinant vector of claim 1, wherein the prompter and 
the terminator of the overlapping transcription units are arranged in a configuration 
shown in Figure 2(d). 

5 10. The eukaryotic recombinant vector of claim 1 that inhibits gene 

expression of the eukaryotic host cell. 

1 1. The eukaryotic recombinant vector of claim 1, wherein the eukaryotic 
host ceil is selected from the group consisting of fungus, yeast cell, plant cell and 

10 animal cell. 

12. The eukaryotic recombinant vector of claim 1 that inhibits expression of 
an endogenous gene present in the host cell, wherein the endogenous gene is 
substantially homologous to the transgene contained in the overlapping transcription 

15 units. 

1 3. The eukaryotic recombinant vector of claim 12, wherein the endogenous 
gene is native to the host cell. 

20 14. The eukaryotic recombinant vector of claim 12, wherein the endogenous 

gene is heterologous to the host cell. 

15. The eukaryotic recombinant, vector of claim 12, wherein the endogenous 
gene is a pathogenic gene derived from one or more members of the group 

25 consisting of virus, bacterium, fungus, and protozoa. 

16. The eukaryotic recombinant vector of claim 1, wherein expression of the 
transgene to yield double-stranded RNA transcripts confers a phenotypic change in 
the eukaryotic host cell. 

30 

17. The eukaryotic recombinant vector of claim 1, wherein the transgene 
encodes a protein selected from the group consisting of a membrane protein, a 
cytosolic protein, a secreted protein, a nuclear protein, and a chaperon protein. 
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18. The eukaryotic recombinant vector of claim 1 that is an autonomously 
replicating vector. 

5 19. The eukaryotic recombinant vector of claim 1, wherein the viral replicon 

is derived from a DNA virus. 

20. The eukaryotic recombinant vector of claim 19, wherein the DNA virus 
is selected from the group consisting of Geminivirus, Caulimoviridae, 
1 0 Badnaviridae; Circpviridae, Circinpviridae, Parvoviridae, Papovaviridae, 

Polydmaviridae, Adenoviridae, Herpesviridae, Poxviridae, Iridoviridae, 
Baculoviridae, Hepadnaviridae, Retroviridae, Gyrovirus, Nanovirus, and African 
Swine Fever virus. 

15 

21 /A host cell transformed with a vector of claim 1 or 10. 

22. The host cell of claim 21 that is a eukaryotic cell selected from the group 
consisting of fungus, yeast cell, plant cell and animal cell. 

20 

23. A transgenic plant comprising a eukaryotic recombinant vector of claim 
lor 10. 

24. The transgenic plant of claim 23 exhibiting reduced expression of an 
25 endogenous gene that is substantially homologous to the transgene contained in the 

eukaryotic recombinant vector. 

25. A kit for generating a double-stranded RNA transcript in a eukaryotic 
cell comprising a eukaryotic recombinant vector of claim 1 in suitable packaging. 

30 

26. A method of inhibiting expression of an endogenous gene present in a 
eukaryotic cell, comprising: 

(a) providing a eukaryotic recombinant vector of claim 12; 
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(b) introducing the eukaryotic recombinant vector into the eukaryotic 
cell; 

(c) culturing the eukaryotic cell of (b) under conditions favorable for 
expression of both sense and antisense RNA transcripts from the 

5 transgene that is contained in the transcription units of the vector, and 

thereby inhibiting expression of the corresponding endogenous gene 
in the eukaryotic cell. 

27. The method of claim 26, wherein the endogenous gene is native to the 
10 host cell. 

28. The method of claim 26, wherein the endogenous gene is heterologous to 
the host cell. 

15 29. The method of claim 26, wherein the endogenous gene is a pathogenic 

gene derived from one or more members of the group consisting of virus, bacterium, 
fungus, and protozoa. 

30. The method of claim 26, wherein inhibition of the endogenous gene 
20 confers a phenotypic change in the host cell. 

3 1 . The method of claim 26, wherein the host eukaryotic cell is selected from 
the group consisting of fungus, yeast cell, plant cell, and animal cell. 

25 32. The method of claim 26, wherein the eukaryotic recombinant vector is an 

autonomously replicating vector. 

33. The method of claim 26, wherein the eukaryotic recombinant vector 
comprises a viral replicon derived from a DNA virus. 

30 

34. The method of claim 26, wherein the DNA virus is selected from the 
group consisting of Geminivirus, Caulimoviridae, Badnaviridae; Circoviridae, 
Circinoviridae, Parvoviridae, Papovaviridae, Polyomaviridae, Adenoviridae, 
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Herpesviridae, Poxviridae, Iridoviridae, Baculoviridae, Hepadnaviridae, 
Retrovirida, Gyrovirus, Nanovirus, and African Swine Fever virus. 

35. The method of claim 26, wherein the eukaryotic recombinant vector 
5 comprises two overlapping transcription units, wherein each transcription unit 

comprises a promoter and a terminator. 

36. The method of claim 26, wherein the promoter is a constitutive promoter. 

10 37. The method of claim 26, wherein the promoter is an inducible promoter. 

38. The method of claim 26, wherein the promoter is a tissue-specific 
promoter. 

15 . 39. The method of claim 35, wherein the promoter and the terminator of the 

overlapping transcription units are arranged in a configuration shown in Figure 2(a). 

40. The method of claim 35, wherein the promoter and the terminator of the 
overlapping transcription units are arranged in a configuration shown in Figure 2(b). 

20 

41. The method of claim 35, wherein the promoter and the terminator of the 
overlapping transcription units are arranged in a configuration shown in Figure 2(c). 

42. The method of claim 35, wherein the promoter and the terminator of the 
25 overlapping transcription units are arranged in a configuration shown in Figure 2(d). 

43. A method of identifying a biological function(s) of an endogenous gene 
of interest in a eukaryotic cell by selectively inhibiting the expression of the 
endogenous gene, the method comprising: 

30 (a) providing a eukaryotic recombinant vector of claim 12; 

(b) introducing the eukaryotic recombinant vector of (a) in to the 
eukaryotic cell; 
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(c) culturing the eukaryotic cell of (b) under conditions favorable for 
expression of both sense and antisense RNA transcripts from the 
transgene contained in the eukaryotic recombinant vector and thereby 
inhibiting expression of the endogenous gene in the eukaryotic cell; 
and 

(d) determining one or more phenotypic changes in the eukaryotic cell 
that correlate with the inhibited expression of the endogenous gene, 
thereby identifying the biological function(s) of the endogenous gene 
in the eukaryotic cell. 

44. The method of claim 43, wherein the eukaryotic cell is selected from 
the group consisting of fungus, yeast cell, plant cell, and animal cell. 

45. The method of claim 43, wherein the eukaryotic cell is a plant cell. 

46. The method of claim 43 , wherein the eukaryotic cell is an animal cell. 
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Figure 4 



Composition 1161 A; 1260 C; 1251 G; 1209 T; 0 OTHER 
Percentage: 24* A; 26% C; 26% G; 25% T; 0%OTHER 

Molecular Weight (kDa) ssDNA: 1506.65 dsDNA: 3009.2 
ORIGIN 

1 AGCGCCCAAT ACGCAAACCG CCTCTCCCCG CGCGTTGGCC GATTCATTAA TGCAGCTGGC , 

61 • ' ACGACAGGTT TCCCGACTGG AAAGCGGGCA GTQAGCGCAA CGCAATTAAT GTGAGTTAGC 
121 TCACTCATTA GGCACCCCAG GCTTTACACT TTATGCTTCC GGCTCGTATG 'TTGTGTGGAA 
181 TTCTGAGCGG ATAACAATTT CACACAGGAA ACAGCTATGA CCATGATTAC GCCAAGCTAT 
241 TTAGGTGACA CTATAGAATA CTCAAGCTAT GCATCAAGCT TCGTACCGAG CTOGGATCCA 
301 CTAGTAACGG CCGCCAGTOT GCTGGAATTC ATGGGCAGAC CCCTCTGXAC TTTAAGAGTG 
361 TTGGCAACCA GTAATGAATA AAAACTCCCG TTTTATTATA TTTGATGAAT GCTGAAAGCT 
421 TACATTAATA TGTCGTGCGA TCGCACGAAA AAACACACGC AAACAATACA GGGGGGTA&T 
481 CGGCGGGCGG CTAAGGGTGG TGCTCGGCGG GCAGAACAfC GAAAAATCAA GATCTATATG 
541 AATTACACTT CCTCCGTAGG AGGAAGGACA GGGGGAGAAT ACCACTTCTC CCCCGGCGAC 
601 ATAATGTAAA TGACGCAGTT TGCCTCGAAA TACTCCAGCT GCCCTCGAGT CATTTCCTTC 
661 ATCCAATCTT CATCCGAGTT GGCGAGGATT ATTGTAGGCT TAGACTTCTT CTGCACCTTT 
721 TTCXTCTXAC CATACTTGGG GTTTACAATG AAATCCCTCT GACAGCCAAC TAACTGTTTC 
781 CAACAAGGAC AGAATTTAAA CGG AATATCA TCTACGATGT TGTAGATTGC GTCTTCGTTG 
841 TATGAAGACC AATCAACATT ATTTTGCCAG TAATTATGAA CCCOTAGGOT TCTGGCCCAA 
901 GTAGATTTTC CGGTTCTTGT TGGGCCGACO ATGTAGAGGC TCTGCTTTCT TGATCTTTCA 
961 TCTGATGACT GGATACAGAA TCCATCCATT GGAGGTCAGA AATTGCATCC TCGAGGGTAT 
1021 AACAGCTAGG TTGAAGGAGC ATCTAAGCTT CGGGACXAAC CTGGAAGATG TTAGGCTGGA 
1081 GCCAATCGTT GATTGACTCA TTACAAAGTA AATCAGGTGA GGAGGGTGGA TGAGGATTGG 
1141 TGAACTCTTC CTGAATCTCA GGAAAAAGCT TATTTGCAGA GTATTCAAAA TACTGC ftATT 
1201 TTG1GGACCA ATCAAAGGGG AGCTCTTTCT GGATCATGGA GAG GTACyCT TCTTTOGAGG 
1261 TAGCGTGTGA AATAATGTCT CGCATTATTT CATCTTTAGA AGGCTTTTTT TCCTTTACCT 
1321 CTGAATCAGA TTrtCCTAGG- AAGGGGGACT TCCTAGGAAT GAAAGTACCT CTCTCAAACA 
1381 CAGCCAGAGG TTCCTTGAGA ATGTAATCCC TCACTCTGTT AACTGACTTG GCACTCTGAA 
1441 TATTTGGGTO AAACCCATTT ATATCAAAGA ACCTTGAGTC AGATATCCTT ATCGGCTTCT 
1501 CTGGCTGAAG CAATGCATGT AAATGCAAAC TTCCATCTTT ATGTGCCTCT CGGGCACATA 
1561 GAATATATTT GGGAATCCAA CGAACGACGA GCTCCCAGAT CATCTGACAG GCGATTTCAG 
1621 GATTTTCTGG ACACTTTGGA TAGGTTAGGA ACGTGTTAGC GTTCCTGTGT GAGAACTGAC 
1681 GGTTGGATGA GGAGGAGGCC ATAGCCGACG ACGGAGGTTG AGGCTGAGGG ATGGCAGACT 
1741 GGGAGCTCCA* AACTCTATAG TATACCCGTG CGCCTTCGAA ATCCGCOGCT CCA3TCTCTT 
1801 ATAG1GGTTG TAAATGGGCC GGACCGGGCC GGCCCAGCAG GAAAAGAAGG CGCGCACTAA 
1861 TATTACCGCG CCTTCTTTTC CTGCGAGGGC CCGGTAGGGA CCGAGCGCTT TGATTTAAAG 
1921 CCTGGTTCTG CTTTGCGGCC GCTCGAGCAT GCATCTAGAG GGCCCAATTC GCCCTATAGT 

1981 GAGTCGTATT ACAATTCACT GGCCGTCGTT TTAC AACG TC GTGACTGGGA AAACCCTGGC 

2041 GTTACCCAAC TTAATCGCCT TGCAGCACAT CCCCCTTTCG CCAGCTGGOG ^AATAGCGAA 

2101 GAGGCCCGCA CCGATCGCCC TTCCCAACAG TTGCGCAGOC TATACGTACG GCAGTTTAAG 

2161 GTTTACACCT ATAAAAGAGA GAGCCGTTAT CGTCTGTTTG TGGATGTACA GAGTGATATT 

2221 ATTGACACGC CGGGGCGACG GATGGTGATC CCCCTGGCCA GTGCACGTCT GCTGTCAGAT 

2281 AAAGTCTCCC GTGAACTTTA CCCGGTGGTG CATATCGGGG ATGAAAGCTG GCGCATGATG 

2341 ACCACCGATA TGGCCAGTGT GCCGGTCTCC GTTATCGGGG AAGAAGTGGC TGATCTCAGC 

2401 CACCGCGAAA ATGACATCAA AAACGCCATT AACCTGATGT TCTGGGGAAT ATAAATGTCA 

2461 GGCCTGAATG GCGAATGGAC GCGCCCTGTA GCGGCGCATT AAGCGCGCGG GTGTGGTGGT 

2521 TACGCGCAGC GTGACCGCTA CACTTGCCAG CGCCCTAGCG CCCGCTCCTT TCGCTTTCTT 

2581 CCCTTCCTTT CTCGCCACGT TCGCCGGCTT TCCCCGTCAA GCTCTAAATC GGGGGCTCCC 

.2641 TTTAGGGTTC CGATTTAGAG CTTTACGGCA CCTCGACCGC AAAAAACTTG ATTTGGGTGA 

2701 TGGTTCACGT AGTGGGCCAT CGCCCTGATA GACGGTTTTT CGCCCTTTGA CGTTGGAGTC 

2761 CACGTTCTTT AATAGTGGAC TCTTGTTCCA AACTGGAACA ACACTCAACC CTATCGCGGT 

2821 CTATTCTTTT GATTTATAAG GGATGTTGCC GATTTCGGCC TATTGGTTAA AAAATGAGCT 

2881 GATTTAACAA AAATTTTAAC AAAATTCAGA AGAACTCGTC AAGAAGGCGA TAGAAGGCGA 
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TGCGCTGCGA ATCGGGAGCG GCGATACCGT AAAGCACGAG GAACCGGTCA GCCCMTCGC 

SSSSSc* ?tcagcaata tcacgggtag ccaacgctat gtcctgatag cggtccgcca 

SSSgS GCCACAGTCG ATGAATCChG AAAAGCGGCC ATTTTCCACC AT6ATAXTCG 
ATOGCCATGG GTCACGACGA GATCCICGCC GTCGGGCATG CTOGCCTTGA 

Sgctcggct ggcgcgagcc CCTGATGCTC ttcgtccaga tcatcctoat 
ggcttccatc cgagtacgto ctcgctcgai gcgatgtttc gcttggtggt 
£££££££ SSccgga TCAAGCGTAT gcagccgcog cattgcatca gccatgatgg 
SSSS SSSSca agotgagaxg acaggagatc ctgcccoggc acttoqccca 

3361 AXACTTTCTC GGCAeUAV^H ^^.^^^ CAACGTCGAG CACAGCTGCG CAAGGAACGC 

ItH SSSS? SS cScS?^ CAG^C^TC AGGGCACCGG 

Itll SSSS S?gSS£a agaaccgggc gcccctgcgc tgacagccgg aacacggcgg 
SE SScaSSS ScgaSgtc tcttctcccc agtcatagcc gaatagcctc tccacccaag 

It A SSSS XCCTGCGTGC AATCCATCTT GTTCAATCAT GCGAAACGAT CCTCATCCTG 

™ SSSStc aStcttgat cccctccgcc atcagatcct tggcggcgag aaagccatcc 
HI 1 , JSSStt gcagScttc ccaacctiac cagagggcgc CCGAGCTGGC aattccggtt 

llti SSSJS ctttoScxt Gccrrmtec ttotocagat agcccagtag cxgacattca 

J5S SSS GCAOCGTITC TGCGGACTGG CTTTCTACGT GAAAAGGATC TAGGTGAAGA 

l a tl SS Stctcato accaaaatcc cttaacgtga gttttcgttc cactgagcgt 

till SSSSt AGAaSgATC -AAAGGATCTT .CTTCAGATCC TTTTXTICTG CGOGXWTCT 
till SSSSS AACAAAAAAA CCACCGCTAC CAGCGGTGGT TTGTTlGCCG GATCAA^AGC 
tlA ?ScAACTCT TTTTCOGAAG GTAACTGGCT TCAGCAGAGC GCAGATACCA AATACTGTCC 
££ SSgISS ScgtStta GGCCACCACT TCAAGAACTC TGTAGCACOG cctacatacc 
SSSgct AATCcrcrrA ccagtcgctc ctcccagtgg cgataagtcg tgtcttaccg 

till GGTTOGACTC AAG^ACGATAG TTAGOGGAT3V AGGCGCAGOG GTCGGGCTGA ACGGGGGCTT 

EE ^SgS-gcc^ag™ gagcgaacga cciacsccga actgagatac CTACAGCGTG 
till aagSSg ctcccgaag ggagaaaggc ggacaggtat ccggtaagcg 

«« SggtoS arcaggagag cgcacgaggg agcttccagg gggaaacgcc tggiatcttt 

till SSSSSSgI^S^ 

till SgSaS SaSS ^^gca acg^c^ ^acg^ ctg^tt^ 

* * -rTMCVTTT TBCTCACATG. TTCTTTCCTG CGTXATCCCC TGATTCTGFTG GATAACCGTA 

SE ?SSgSS SSSSS gataccgctc gccgcagccg aacgaccgag cgcagcgagt 
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Figure 5 



pMSVLSB-2: 3413 bp; 

Composition 777 A; 950 C; 884 G; 802 T; 0 OTHER 
Percentage: 23% A; 28% C; 26% G; 23% T; 0%OTHER 

Molecular Weight (kDa) : ssDNA: 1052.40 dcDHA: 2104.2 

ORIGIN agcgcccaat ACGCAAACCG ^^CC «XIC™A TGOUSCI** 

61 ACGACAGGTT TCCCOACTGG AAAGCGGGCA GTGAGCGCAA CGCAATTAAT GTGAGTTAGC 

121 TCACTCATTA GGCACCCCAG GCTTTACACT TTATGCTTCC GGCTCGTATG TTGTGTGOAA 

181 TTGTOAGCGG ATAACAATTT CACACAGGAA ACAGCTATGA CCATGATTAC GCCAAGCTAT 

241 TTAGGTOACA CTATAGAATA CTCAAGCTAT GCATCAAGCT TGGGCCCGGT AGGGACCGAG 

301 CGCTTTGATT TAAAGCCTGG TTCTGCTTTG TATGATTTAT OTAAAGCAGC CCAATCTAAA 

361 GAAACCGGTC CCGGGCACTA TAAATTGCCT AACAAGTGCG ATTCATTCAT GGATGCITTA 

421 AACTCGAGTC TAGAGGGCCC GAATTCTGCA GATATCCATC* ACACTGGCGG CCGCTCGAGC 

4*1 ATGCATCTAG AGGGCCCAAT TCGCCCTATA GTGAGTCGTA TTACAATTCA CTGGCCGTCG 

541 TTTTACAACG TCGTGACTGG GAAAACCCTG GCGTTACCCA ACTTAATCGC CTTGCAGCAC 

601 ATCCCCCTTT CGCCAGCTGG CGTAATAGCG AAGAGGCCCG CACCGATCGC CCTTCCCAAC 

661 AGTTCCGCAG CCTATACGTA CGGCAGTTTA AGGTXTXCAC CTATAAAAGA GAGAGCCGTT 

721 ATCGTCTGTT TGTGGAIOTA CAGAGTGATA TEATTGACAC GCCGGGGCGA CGGATGGTGA 

781 TCCTCCTOGC CAGTGCAteT CTGCTCTCAG ATAAAGTCTC CCGTGAACTT TACCCGGTGQ 

841 TCCATATCGG GGATGAAAGC TCGCGCATGA TGACCACCGA TATGGCCAGT GTGCCGGTCT 

901 CCGTTATCGG" GGAAGAAGTO GCTGATCTCA GCCACCGCGA AAATGACATC aaaaacgcca 

961 TTAACCTGAT GTTCTGGGGA ATATAAATCT CAGGCCTGAA TGGCGAATGG ACGCGCCCTG 

1021 TAGCGGCGCA TTAAGCGCGC GGGTGTGGTG GTTACGCGCA GCGTGACCGC TACACTTGCC 

1081 AGCGCCCTAG CGCCCGCTCC TTTCGOTTC TTCCCTTCCT TTCTCGCCAC GTTCGCCGGC 

1141 TTTCCCCGTC AAGCTCTAAA TCGGGGGCTC CCTTTAGGGT TCCGATTTAG AGCTTTACGG 

1201 CACCTCGACC GCAAAAAACT TGATTTGGGT GATGGTTCAC GTAGTGGGCC ATCGCCCTGA 

1261 TAGACGGTTT • TTCGCCCTTT GACGTTGGAG TCCACGTTCT TTAATAGTGG ACTCXTGTTC 

1321 CAAACTGGAA CAACACTCAA CCCTATCGCG GTCTATTCTT TTGATTTATA AGGGATGTTG 

1381 CCGATTTCGG CCTATTGGTT AAAAAATGAG CTGATTTAAC AAAAATTTTA ACAAAATTCA 

1441 GAAGAACTCG TCAAGAAGGC GATAGAAGGC GATGCGCTGC GAATCGGGAG CGGCGATACC 

1501 6TAAAGCACG AGGAAGCGGT CAGCCCATTC GCCGCCAAGC TCTTCAGCAA TATCACGGGT 

1561 AGCCAACGCT ATGTCCTGAT AGCGGTCCGC CACACCCAGC CGGCCACAGT OGAT^WITCC 

1621 AGAAAAGCGG CCATTTTCCA CCATGATATT OGGCAAGCAG GCATCGCCAT GGGTCAOGAC 

1681 GAGATCCTCG CCGTCGGGCA TGCTCGCCTT GAGCCTGGCG AACAGTTCGG CTGGCGCGAG 

1741 CCCCTGATGC TCTTCGTCCA GATCATCCTG ATCGAGAAGA COGGCTTCCA TOCGAGTACG 

1801 TGCTCGCTCG ATCCGATGTT TCGCTTGGTG GTCGAATGGG CAGGTAGCCG GATCAAGCGT 

1861 ATGCAGCCGC CGCATTGCAT CAGCCATGAT GGATAGTTTC TCGGCAGGAG CAAGGTGAGA 

1921 TGACAGGAGA TCCTGCCCCG GCACTTCGCC CAATAGCAGC CAGTCCCTTC CCGCTTCAGT 

1981 GACAACGTCG AGCACAGCTG CGCAAGGAAC GCCCGTCGTG GCCAGCCACG ATAGC£GCGC 

2041 TGCCTCGTCT TGCAGTTCAT TCAGGGCACC GGACAGGTCG GTCTTGACAA AAAGAACCGG 

2101 GCGCCCCTCC GCTGACAGCC GGAACACGGC GGCATCAGAG CAGCCGATTG TCTGTTGTGC 

2161 CCAGTCATAG CCGAATAGCC TCTCCACCCA AGCGGCCGGA GAACCTGCGT GCAATCCATC 

2221 TTGTTCAATC ATGCGAAACG ATCCTCATCC TGTCTCTTGA TCAGATCTTG ATCCCCTGCG 

2281 CCATCAGATC CTTGGCGGCG A6AAAGCCAT CCAGTTTACT TTGCAGGGCT TCCCAACCTT 

2341 ACCAGAGGQC GCOCCAGCTG GCAATTCCGG TTCGCTTGCT GTCCATAAAA CCGCCCAGTC 

2401 TAGCTATCGC CATGTAAGCC CACTGCAAGC TACCTGCTTT CTCTTTGCGC TTGCGTTTTC 

2461 CCTTGTCCAG ATAGCCCAGT AGCTGACATT CATCCGGGGT CAGCACCGTT TCTGCGGACT 

2521 GGCTTTCTAC GTGAAAAGGA TCTAGGTGAA GATCCTTTTT GATAATCTCA TGACCAAAAT 

2581 CCCTTAACGT GAGTTTTCGT TCCACTGAGC GTCAGACCCC GTAGAAAAGA TCAAAGGATC 

2641 TTCTTGAGAT CCTTTTTTTC TGOGCGTAAT CTGCTGCTTG CAAACAAAAA AACCACCGCT 

2701 ACCAGCGGTO GTTTGTTTGC CGGATCAAGA GCTACCAACT CTTTTTCCGA AGGTAACTGG 

2761 CTTCAGCAGA GCGCAGATAC CAAATACTGT CCTTCfAGTC TAGCCGTAGT TAGGCCACCA 

2821 CTTCAAGAAC TCTGTAGCAC CGCCTACATA CCTCGCTCTG CTAATCCTGT TACCAGTGGC 

2881 TGCTGCCAGT GGCGATAAGT CGTGTCTTAC CGGGTTGGAC TCAAGACGAT AGTTACCGGA 
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TAAGGCGCAG CGGTCGGGCT GAACGGGGGG TTCGTGCACA CAGCCCAGCT TGGAGCGAAC 
SSS? GAACTGAGAT ACCTACAGOG TGAGCTATGA GAAAGCGCCA CGCXTCCOGA 
SSSS GCGGACAGGT ATCCGGTAAG CGGCAGGGTC GGAACAGGAG AGCGCACGAG 
^S^A GGGGGAAACG CCTGGTATCT TTATAGTCCT GTCGGGTTTC GCCACCTCTG 
aS5SgS? CGATTTTTCT GATOCTCGTC AGGGGGGCGG AGCCTATGGA AAAAOGCCAG 

rScS? Stttacggt tcctcggctt ttsctggcct tttgctgaca tgttctttcc 
SSSSS Satxctg togataacog tattaccgcc tttgagtgag ctgataccgc 
Sc£S£ SSacgaccg agcgcagcga gtcagtgagc gaggaagcgg aag 
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Figure 6 



pMSVLSB-3 : 

PMSVLSB2 Apa fragment inserted: 4961 bp; 
Composition 1190 A; 1276 C; 1262 0, 1233 T; 0 OTffiR 
Rentage: 24% A;. 26% C 25% 0, 25* X; Q%OIWBR 

Molecular Height. <*Da) : ssDNA: 1531.26 dsDNA: 3058.5 

? HGIR AGCGCCCAAT ACGCAAACCO CCTCTCCCCG CGCGTTGGCC GATTCATTAA IX3CAGCTGGC 

61 ACGACAGGTT TCCCGACTGG AAAGCGGGCA GTGAGOGCAA CGCAATTAAT GTGAGTTAGC 

121 TCACTCATTA GGCACCCCAG GCTTTACACT TTATOCTTCC GGCICGTATG TTGTGTGGAA 

181 TTGTGAGOGG ATAACAATTT CACACAGGAA ACAGCTATGA CCATGATTAC GCCRAGCTAT 

241- TTAGGTGACA ' CTAXAGAATA CTCAAGCIAT GCATCAAGCT TGGTACCGAG CTCGGSTCCA 

301 CTAGTAAOGG -CCGCCAGTGT GCTGGAATTC ATGGGCAGAC CCGTCXGTAC TTTAAGAGTG 

361 TTGGCAACCA GTAATGAATA AAAACTCCCG TTTTATOAXA TTTGATGAAT GCTGAAAGCT 

421 TACATTAATA TGTOGTGCGA TGGCACGAAA AAACACAC3C AAACAATACA GGGGGGTAGT 

481 CGGCGGGCGG CTAAGGGTGG TGCTCGGCGG GCAGAACATC GAAAAATCAA GAICTAIATG 

541 AATTACACTT CCTCCGTAGG AGGAAGCACA GGGGGAGAAT ACCACTTCTC CCC0GGCGAC 

601 ATAATGTAAA TGACGCAGTT TtSCCTCGAAA TACTCCAGCT GCCCTGGAGT CATTTCCTTC 

661 ATCCAATCrr CATCOGAGTT GGCGAGGAXT ATIGXAGGCT TAGACTTCTT CTGCACCTTT 

721 TTCTTCTTAC CATACTTGGG GTTTACAATG AAATCCCTCT GACAGCCAAC TAACTGTTTC 

781 CAACAAGGAC AGAATTTAAA CGGAATATCA TCTACGATGT TGTAGATTGC GTCTTCGTT3 

841 TATGAAGACC AATCAACATT ATTTTGCCAG TAATTATGAA CCCCTAGGCT TCTCGCCCAA 

901 GIAGATTTTC CGGTTCTTGT TGGGCOGACG ATGIAGAGGC TCTGCTTTCT TGAICTTTCA 

HI ?SgatgaS ggatacagaa tccatccatt ggaggtcaga aattgcatcc tcoagggtat 

1021 AACAGGTAGG TTGAAGGAGC ATGTAAGCTT OGGGACXAAC CTGGAAGATG TTAGGCTGGA 

llll JJSatcgtt GATTGACTCA TTACAAAGTA AATCAGGIGA GGAGGGTGGA TGAGGATTGG 

{HI TGRACTCTTC CTGAATCTCA GGAAAAAGCT IATTTGCAGA GTATTCAAAA TACTGCAATT 

lltl ™Sa^S atcaaagggg agctctttct ggatcatgga gaggtactct kt™agg 

llsl Sgcgtgtca aataatctct cgcattattt catcittaGa aggctttttt tccttiacct 

1321 CTGAATCAGA ttttcctagg aagggggact tcctaggaat gaaagxacct ctctcaaaca 

1381 cagccagagg ttccttgaga atgtaatccc tcactctgtt aactgacttggcactct^ 

llll TATTTGGGTG AAACCCATTT ATATCAAAGA ACCTTGAGTC AGATATCCTT MOqCTTCT. 

lltl CTGGCTGAAG CAATGCATGT AAATGCAAAC TTCCATCTTT ATGTGCCTCT ^ACA™ 

1561 GAATATATTT GGGAATCCAA CGAACGACGA GCTCCCAGAT CA^TGACAG ^JTCM 

1621 GATTTTCTGG ACACTTTGGA TAGGTTAGGA ACGTGTTAGC GTTCCTGTGT GAGAACTGAC 

1681 GGTTGGATGA GGAGGAGGCC AXAGCCGACG ACGGAGGTTG AGGCTGAGGG ATGGCAGACT 

I'll ggSgctcca aactctatag TATACCCG^ cgccttcgaa atccgccgct ccattgtctt 

1801 ATAGTGGTTQ XAAATGGGCC. GGACCGGGCC GGCCCAGCAG GAAAAGAAGG CGCGCACTAA 

1861 XATTACCGCG CCTTCTTTTC CTGCGAGGGC CCGGTAGGGA CCGAGCGCTT TGATTfAAAG 

1921 CCTGGTTCTG CTTTGTATGA TTTATCTAAA GCAGCCCAAT CTAAAGAAAC CGGTCCCGGG 

1981 CACTATAAAT TGCCTAACAA GTGCGATTCA TTCATGGATC CTTTAAACTC GAGTCTAGAG 

2041 GGCCCAATTC GCCCTATAGT GAGTCGTATT ACAATTCACT GGCCGTCGTT TTACAACGTC 

2101 GTGACTGGGA AAACCCTGGC GTTACCCAAC TTAATCGCCT TGCAGCACAT CCCCCTTTCG 

2161 CCAGCTGGCG TAATAGCGAA GAGGCCCGCA CCGATCGCCC TXCCCAACAG TTGOGCAGCC 

2221 TATACGTACG GCAGTTTAAG GTTTACACCT ATAAAAGAGA GAGCCGTTAT CGTCTGTTTG 

2281 TGGATGTACA GAGTGATATT ATTGACACGC CGGGGCGACG GATGGTGATC CCCCTGGCCA 

2341 6TGCACGTCT GCIGTCAGAT AAAGTCTCCC GTGAACTTTA CCCGGTGGTG CATATCGGGG 

2401 ATGAAAGCTG GCGCATGATG ACCACCGATA TGGCCAGTGT GCCGGTCTCC GTTATCGGGG 

2461 AAGAAGTGGC TGATCTCAGC CACCGCGAAA ATGACATCAA AAACGCCATT AACCTGATGT 

2521 TCTGGGGAAT ATAAATGTCA GGCCTGAATG • GCGAATGGAC GCGCCCTGTA OCGGCGCATT 

2581 AAGCGCGCGG GTGTGGTGGT TACGCGCAGC GTGACCGCTA CACTTGCCAG CGCCCTAGCG 

HA CCCGCTCCTT TCGCTTTCTT CCCTTCCTTT CTCGCCACGT. TCGCCGGCTT TCCCCGTCAA 

2701 GCTCTAAATC GGGGGCTCCC TTTAGGGTTC CGATTTAGAG CTTTACGGCA CCTCGACCGC 

2761 AAAAAACTTG. ATTTGGGTGA TGGtTCACGT AGTGGGCCAT CGCCCTGATA GACGGTTTTT 
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^ /-"nfCTTCTTT AATAGTGGAC TCTTGTTCCA AACTGGAACA 

282 1 CGCCCTTTGA CGTTGGAGTC ^TTCTTT GGATGTTGCC GATTTCGGCC 

2881 ACACTCAACC CTATCGCGGT CTATTCTTTT AAARTTCAGA AGAACTCGTC 

29 41 TATTGGTTAA AA^TGAGCT A^CGGGAGCG GCGATACCGT AAAGCACGAG 

3001 AAGAAGGCGA TAGAAGGCGA ^^GCTO^ XTCAGCAATA TCACGGGTAG CCAACGCTAT 

3061 QAAGOGGTCA GCCCATTCGC ^^^CTC QCCACAGTCG ATGAATCCAG. AAAAGCGGCC 

3121 GTCCTGATAG CGGTCCGCCA CACCXAGQPG J^^TCG GTCACGAOGA GATCCTCGCC 

3181 ATTTTCCACC ATGATATTCG G^AGGC GGCGCGAGCC CCTGATGCTC 

3241 GTCGGGCATG CTCGCCTTGA GCCTGGCGAA CGAGTACGTC CTCGCTCGAT 

3301 TTCGTCCAGA TCATCCTGAT SSgCCGGA TCAAGCGTAT GCAGCCGCCG 

3361 GCGATGTTTC GCTTGGTGGT CGAATGGGCA agGTGAGATG ACAGGAGATC 

3421 CATTGCATCA GCCATGATGG ATACTTTCTC GCTTCAGTGA CAACGTCGAG 

3481 CTGCCCCGGC CC8TCGTGGC SgCCACGAT AGCOGCGCTG CCTCGTCTTG 

3541 CACAGCTGCG CAAGGAACGC CCGTCGTGGC AGAACCGGGC GCCCCTGCGC 

3601 CA3TTCATTC AGGGCACOGG ACAGGTCGGT JgtTCTCCCC AGTCATAGCC 

3661 TGACAGCCGG AACACGGCGG CATCAGAGCA AATCCATCTX GTTCAATCAT 

3721 GAATAGCCTC TCCACC.CAAG CGGOCGGAGA A £-£"^ CCCCTGCGCC ATCAGATCCT 

3-781 GCGAAACGAT CCTCATCCTG ^CTCTTGATC AGATCTTGAT ^GGGCGC 

3641 TGGCGGCGAG AAAGCCATCC AGTTTACTTT GCCCAGTCTA GCTATCGCCA 

SMI CCOU3CTGGC WtTTCCGGTT fjfjj^ S^CC TTGTC^G^ 

3961 TGrAAGCCCA CTOCAAGCTA CCTGCTTTCT JSCOQRCJOa CTTTCXACGT 

4021 AGCCCAGTAG CTGACATTCA ^J^^ TAATCTCATG ACCAAAATCC CTTAACGTGA 

4061 GAAAAGGATC TAGGTOAAGA TCCTTTTTUA AAROGATCTT CTTGAGATCC 

4141 GTTTTCGTTC CACTGAGOGT ^gTCGT JGAAAAU^ ^^^c CAGCGGTGOT 

4201 TTTTTTTCTG CGCGTAATCT ^GCTTGCA ^ctoqcT TCAGCMAGC 

4261 TTGTTTGCCG GATCAAGAGC TACCAACTCT QGCCACCACT TCAAGAACTC 

4321 GCAGATACCA AATACTGTCC TTCTAGTGTA GjJJg^TTA eCAGTGGCTG CTGCCAOTGG 

4381 TGTAGGACCG CCTACATACC TCGCTCTGCT „ ACCGGATA AGGCGCAGCG 

4441 CGATAAGTCG TGTCTTACCG GGTTGGACTC J»™^^ QAGCGAACGA CCTACACCGA 

4501 GTCGGGCXGA ACGGGGGGTT CGT5CACACA GCCCAGCTTG ggaGARAGGC 

4561 ACTGAGATAC CTACAGCGTG A5CTATGAGA JJSC6CCACG fyscrtcCAS0 

4621 GGACAGGTAT GCGGTAAGCG G^GGGTCGG AACAGGAGAG XTGAGCGTCG 

4681 GGGAAACGCC TGGTATCTTT ATOGTCCTGT AACGCCAGCA ACGCGGCCTT 

4741 ATTTTTGTGA TGCTCGTGAG TGCTCACATG TTCTTTCCTG CGTTATCCCC 

4801 TTTACGGTTC CIGGGCTTTT GCTCGCCTTT GATACCGCTC GCCGCAGCCG 
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pMSVLSB4*. 6309 bp 
Composition 1522 
Percentage : 24% 

Molecular Weight 
ORIGIN 

1 AGCGCCCAAT 

61 ACGACAGGTT 

121 TCACTCATTA 

181 TTGTGAGCGG 

241 TTAGGTGACA 

301 CTAGTCCCGA 

361 CTATATTTTG 

421 CCATCTCATA 

481 CAGAAATTAT, 

541 TTGCCAAATG 

601 AAGOATCTTC 

€61 CTCCTCGGCC 

721 CCACGGCPC3C 

781 CACGACCTCC 

841 GGTGTTGTCC 

901 GACCACACCG 

961 GAACTCGACC 

1021 GGCCATGGTG 

1081 GAGCGGATAC 

1141 TCCCCGAAAA 

1201 TACCGCAT^CA 

1261 GAAATGAACT 

1321 CCC TTACGT C 

13 81 CTTCTTTTTC 

1^41 AGGCATCTTG 

1501 CTTTTCTACT 

IS 61 TCCCGATATT 

1621 TTTGATATTC 

16.81 GCAGACCCGT 

1741 ATTATATTTG 

1801 ACACGCAAAC 

1861 AACATCGAAA 

1921 GAGAATACCA 

1981 CCAGCTGCCC 

2041 TAGGCTTAGA 

2101 CCCTCTGACA 

2161 CGATGTTGTA 

2221 TATGAACCCC 

2281 AGAGGCTCTG 

2341 GTCAGAAATT 

2401 ACTAACCTGG 

.2461 AGGTGAGGAG 

2521 TGCAGAGTAT 

25B1 CATGGAGAGG 

.2641 TTTAGAAGGC 

2701 AGGAATGAAA 

2761 TCTGTTAACT 

2821 TGAGTCAGAT 

28B1 ATCTTTATGT 



A; 1620 C; 1590 G; 1577 T; 0 OTHER 
A,' 26% Ci 25% G; 25% T; 0%OlHER 

()cD.a): ssDNAr 1947.08 dsDNA: 3889.6 



ACGCAAACCG 
TCCCGACTGG 
GGCACCCCAG 
ATAACAATTT 
CTATAGAATA 
TCXAGTAACA 
TTXTCTATCG 
AATAACGTCA 
ATGATAATCA 
TTTGAACGAT 
ACCTAGATCC 
ACGAAGTGCA 
TCGCCGATCT 
GACCACTCGG 
GGCACCACCT 
GCGAAGTCGT 
GCTCCGGCGA 
GCCCTCCTCA 
ATATTTGAAT 
GTGOCACCTG 
GGCGAAAJTG 
TCCTTATATA 
AGTGGAGATA 
CACGTAGCTC 
AACGATAGCC 
GTCCTTTTGA 
ACCCTTTGTT 
TTGGAGTAGA 
CTGTACTTTA 
ATGAATGCTG 
AATACAGGGG 
AATCAAGATC 
CTTCTCCCCC 
TGGAGTCATT 
CTTCTTCTGC 
GCCAACTAAC 
GATTGCGTCT 
TAGGCTTCTG 
CTTTCTTGAT 
GCATCCTCGA 
AAGATGTTAG 
GGTGGATGAG 
TCAAAATACT 
TACTCTTCTT 
TTTTTTTCCT 
GTACCTCTCT 
GACTTGGCAC 
ATCGTTATCG 
GCCTCTCGGG 



CCTCTCCCCG 

AAAGCGGGCA 

GCTTTACACT 

CACACAGGAA 

CTCAAGCTAT 

TAGATCACAC 

CGTATTAAAT 

TGCATTACAT 

TCGACAGACC 

CGGGGAAATT 

TTTTAAATTA 

OGCAGTTGCC 

CGGTCATGGC 

CGTACAGCTC 

GGTCCTGGAC 

CCTCCACGAA 

CGTCGCGCGC 

CGTGCTATTA 

GTATTTAGAA 

TATGCGGTGT 

TAAACGCGGC 

GAGGAAOGGT 

TCACATCAAT 

CTCGTGGGTG 

TTTCCTTATC 

TGAAGTGACA 

GAAAAGTCTC 

CGAGAGAGTG 

AGAGTGTTGG 

AAAGCTTACA 

GGTAGTCGGC 

TATATGAATT 

GGCGACATAA 

TCCTTCATCC 

ACCTXTTTCT 

TGTTTCCAAC 

TCGTTGTATG 

GCCCAAGTAG 

CTTTGATCTG 

GGGTATAACA 

GCTGGAGCCA 

GATTGGTGAA 

GCAATTTTGT 

TGGAGGTAGC 

TTACCTCTGA 

CAAACACAGC 

TCTGAATATT 

GCTTCTCTGG 

CACATAGAAT 



CGCGTTGGCC 
GTGAGCGCAA 
TTATGCTTCG 
ACAGCIATGA 
GCATCAAGCT 
CGCGCGGGAT 
GTATAATTGC 
GTTAATTATT 
GGCAACAGGA 
OGCTCGAGTT 
AAAATGAAGT 
GGCCGGGTCG 
CGGCCCGGAG 
GTCCAGGCCG 
CGCGCTGATG 
GTCCCGGGAG 
GGTGAGCACC 
TTGAAGCATT 
AAATAAACAA 
GAAATACCGC 
OGCTTAATTA 
ClTGCGAAGG 
CCACTTGCTT 
GGGGTCCATC 
GGAATGATGG 
GATAGCTGGG 
AATAGCCCTT 
TCGTGCTCCA 
CAACCAGTAA 
TTAATATGTC 
GGGCGGCTAA 
ACACTTCCTC 
TGTAAATGAC 
AATCTTCATC 
TCTTACCATA 
AAGGACAGAA 
AAGACCAATCj 
ATTTTCCGGT 
ATGACTGGAT 
GGTAGGTTGA 
ATCGTTGATT 
CTCTTCCTGA 
GGACCAATCA 
GTGTGAAATA 
ATCAGATTTT 
CAGAGGTTCC 
TGGGTGAAAC 
CTGAAGCAAT 
ATATTTGGGA 



GATTCATTAA 
CGCAATTAAT 
GGCTCGTATG 
CCATGATTAC. 
TGGTACCGAG 
AATTTATCCT 
GGGACTTCTAA 
ACATGCTXAA 
TTCAATCTTA 
AATTAAGCGG 
TTTAGCACGT 
CGCAGGGCGA 
GCGTCCCGGA 
CX3CACCCACA 
AACAGGGTCA 
AACCCGAGCC 
GGAACGGCAC 
XATCAGGGTT 
ATAGGGGTTC 
ACAGATGCGT 
AGTCGACGTC 
ATAGTGGGAT 
TGAAGACGTG 
TTTGG6ACCA 
CATTTGTAGG 
CAATGGAATC 
TGGTCTTCTG 
CCATGTTGAC 
TGAATAAAAA 
GTGCGATGGC 
GGGTGGTGCT 
CGTAGGAGGA 
GCAGTTTGCC 
CGAGTTGGCG 
CTTGGGGTTT 
TTTAAACGGA 
AACATTATTT 
TCTTGTTGGG 
ACAGAATCCA 
AGGAGGATGT 
GACTCATTAC 
ATCTCAGGAA 
AAGGGGAGCT 
ATGTCTCGCA 
CCTAGGAAGG 
TTGAGAATGT 
CCATTTATAT 
GCATGTAAAT 
ATCCAACGAA 



TGGAGCTGGC 
GTGAGTXAGC 
TTGTGTGGAA 
GCCAAGCTAT 
CTCGGATCCA 
AGTTTGCGCG 
TCATAAAAAC 
OGTAATTCAA 
AGAAACTTTA 
CXX5CCTCAAA 
GTCAGTCCTG 
ACTCCCGCCC 
AGTTCGTGGA 
CCCAGGCCAG 
CGTOGTCCCG 
GGTCGGTCCA 
.TGGTCAACTT 
ATTGTCTCAT 
CGCGCACATT 
AAGGAGAAAA 
.CXCTCCAAAT 
TGTGCGTCAT 
GTTGGAAOGT 
CTGTCGGCAG 
TGCCACCXTC 
CGAGGAGGTT 
AGACTOTATC 
GAATfCATGG 
CTCCCGTTTr 
ACGAAAAAAC 
CGGCGGGCAG 
AGCACAGGGG 
TCGAAATACT 
AGGATTATTG 
ACAATGAAAT 
ATATCATCTA 
TGCCAGTAAT 
CCGACGATGT 
TCCATTGGAG 
AAGCTTCGGG 
AAAGTAAATC 
AAAGCTTATT 
CTTTCTGGAT 
TTATTTCATC 
GGGACTTCCT 
AATCCCTCAC 
CAAAGAACCT 
GCAAACTTCC 
CGACGAGCTC 
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Figure 7 (conra) 

™^*r^&TT TTCTGGACAC TTTGGATAGG TTAGGAACGT 

2941 CCAGATCATC TGACAGGCGA SSSSSg GAGGCCATAG CCGACGACGG 

300! GTTAGCGTTC CTGTGTGAGA ACTGACGGTT ^TS^ CTATAGTATA CCCGTGCGCC 

3061 , AGGTTGAGGC TGAGGGATGG CAGACTGGGA ^AAA^ .jggGCCGGAC CGGGCCGGCC 

3121 TTCGAAATCC GCCGCTCCAT TCTCTT™ * cnTTCCTCC GRGGGCCCGG 

3181 CAGCAGGAAA AGAAGGCGCG CACTAATATT ACCGCGCCTT TATCTAAAG C 

3241 GGTAGGGACC GAGCGCTTTG ATTTAAAGCC TGGTTCTGCT ™ GCGATTCATr 

3301 AGCCCAATCT AAAGAAACCG GTgCGGGCA ^MJAATTG ^ GTCGTATTAC 

3361 CATGGATCCT TTAAACTCGA GTCTAGAGGG ^CAA^i ^^^^ TACCCAACTT 

^421 AATTCACTGG EES GGCCCGCACC 

3481 AATCGCCTTG CAGCACATCC CCCTTTCGCC AGTTTAAGGT TTACACCIAT 

3541 GATCGCCCTT CCCAACAGTT GCGCAGCCTA TACGTAU^ GTGATATTAT ■ TCRCACGCCG 

3I0I AAAAGAGAGA ^ATCG ££££££ SISSS ^GATAA AGTCTCCCGT 

3661 GGGCGACGGA TGGTGATCCC CCTGGCCAW ^^^^^^^ GCATGATGAC CACCGATATG 

3721 GAACTTTACC CGGTGGTGCA ^TCGGGGAT CCGCGAAART 

3781 GCCAGTGTGC CGGTCTCCGT TATCGGGQM GAAGT^u aaaJGTCAGO CCTCARTGGC 

5841 GACATCAAAA ACGCEATTAA CCTGATGTTC TGGGGA^AT JAATGTCA^ 

3901 GAATGGACGC GCCCXOTAGC GGCGCAT^A 6CGCGCGGGT ^gTGG^ ^^^^ 

3961 GACCGCTACA C1TGCCAGCG COTAGCGCC ^CTCCITTC ^ TAGGGTTCCG 

4021 CGCCACGTTC GCCGGCTXTC CCCGTCAAGC TTGGGTGATG GTTCACGTAG 

40B1 ATTTAGAGCT TTACGGCACC TCGACCGCAA J^CT^. ^^^^ CGTTCTTTAA 

4141 TGGGCCATCG CCCTGATAGA ^TTTTTCG £^^£3 J^GGTCT ATTCTTTTGA . 

42 01 TAGTGGACTC TTGTTCCAAA "GGAACAAC ^J^J^ AATG AG CTGA TTTAACAAAA 

4261 TTTATAAGGG *^™CGA JJ^™ g^ScSS GAAGGCGATO CGCTGCGAAT 

4321 ATTTTAACAA AATTCAGAAG AACTCUXWUi CCATTCGCCO CCAAGCTCTT 

4381 CGGGAGOGGC GATACCGTAA AGCACqAGGA *^rCAGC CCCAGCCGGC 

4441 CAGCAATATC ACGGGXAGCC ^CGCXA^ GATATTCGGC AAGCAGGCAT 

4501 CACAGTCGAT GAATCCAGAA JAGCGGCCAT CTGGCGAACA 

4561 CGCCATGGGT CACGACGAGA TCCTCGCU« X~£raGATC ATCCTGATCG ACAAGACCGG 

4621 GTTCGGCTGG CGC^G«TC TGATGOT CGTCCAGATC ^^^.^^gG 

4681 CTTCCATCCG AGTACGTGCT COCTC^TGC ^GT^CGC j^GTTTCTCGG 

4741 TAGCCGGATC AAGCGTATGC AGCCGCCGCA T^cca^T AGCAGCCAGT 

4601 CAGGAGCAAG GTGAGATGAC AGGAGATCCT ^CGGCC GTCGTOGCCA 

4861 CCCTTCCCGC TTCAGTGA^ S^ITCAO GGCACCGGAC AGGTCGGTCT 

4921 GCCACGATAG CCGCGCTGCC ^TCTTGCA CACGGCGGCA TCAGAGCAGC 

4981 TGACAAAAAG AACCGGGCGC CCCTGCGCTG CACCCAAGCG GCCGGAGAAC 

5041 CGATTGTCTG TTGTGCCCAG ^TA^GA J2S££ raTCCTGTC TCTTGATCA3 

5101 CTGCGTGCAA TCCATCTTGT TCAATC^ GRAACGA^ tttacTTTGC 

5161 ATGTTGATCC CCTGCGCCAT CAGATCCTTG <*^™£ E£lTCa CTTGCTGTCC 

5221 AGGGCTTCCC AACCTTACCA GAGGGCGCCC GCAAGCTACC TGCTTTCTCT 

5281 ATAAAACCGC CCAGTCTAGC TATCGCCATG GftCATTCATC CGGGGTCAGC 

5341 TTGCGCTTGC GTTTTCCCTT GTCCAGATAU ft GGTGARGAT C CTTTTTGATA 

5401 ACCGTTTCTG CGGACTGGCT TXCTACGTGA AAAGGATCTA ^ GACCCCGTAG 

5461 ATCTCATGAC CAAAATCCCT £££££ CGTAATCTGC TGCTTGCAAA 

5521 AAAAGATCAA AGGATCTTCT ScCGGA TCAAGAG CTA CCAACTCTW 

5S81 CAAAAAAACC ACCGCTACCA GCGGTGGTTT ^'^^ TACTGTCCTT CTAGTGTAGC 

5641 TTCCGAAGGT AACTGGCTTC AGCAGAGCGC AGATACCAAA T ^ GCTCrGCTAA 

5701 CGTAGTTAG3 CCACCACTTC AAGAACTCTG ™G^^ TCT TACCGGG TTGGACTCAA 

5761 TCCTGTTACC AGTGGCTGCT GCCAGTGGCG ggggggTTCG TGCACACAGC 

5621 GACGATAGTT ACCGGATAAG GCGCAGCGGT CGGGJTOAAC CTATGAGAAA 

5881 CCAGCTTGGA GCGAACGACC TACACCGAAC ^GATACCT ACA ^ 

5941 GCGCCACGCT TCCCGAAGGG AGAAAGGCGG QTATCTTTAT AGTCCTCTCG 

6001 CftGGAGAGCG CACGAGGGAG ^T??^?^^ TTTTGTGATG CTCCTCAGGG GGGCGGAGCC 

SS EES =cS EES EEi T - c — g—g 

6301 AAGCGGAAG 
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Figure 8 



pHSVLSB-5: 8043 bp^ X; 0 OTHER 

Composition 1-983 A 1992 C 20X 
Percentage: 25% A, 25* c, ^sv 

«oleoular Weight (KDa) = ssDNA: 2483 - 31 dsDNA: 4558.5 

ORIGIN . r*nr*cvi cyzrrtTTGGCC GATTCATTAA TGCAGCTGGC 

AGCGCCCAAT ACGCAAACCG CCTCTCCCCG ^JJGGCC J™*^ 

ACGACAGGTT TCCCGACTGG A^GGGCA ^£f£^ GGCTCGTATG TTGTGTGGAA 

SKS SSSSS SSSS S™ «™ -™ 

T-TAGGK5ACA CTATAGAATA CTCAAGCTAT GGATCAAGCT TGGTACCGAG CTOGGftTCCA 

f^SS SSaS 'aSaagcaca gggggagaat accacttctc ccccggcgac 
SSSctt tocctcgaaa tactccagct gccctggagt catttccttc 
„ SSS£ SSSS SSSSSt attgtaggct tagacttctt ct^ccttt 

661 S^S^S CATACTTGGG GTTTACAATG AAATCCCTGT GACAGCCAAC TAACTGTTTC 
721 TTCTTCTTAC CATACTTGGQ TCTACGATGT TGTAGATTGC GTCTTCGTTG 

781 aSSSS TAATTATGAA*CCCCTAGGCT TCTGGCCCAA 

= =SS = = = 

SSSS 555X SSSS SSSS SSSS SSSS 

TAGGGTGTGA AATAA-^ CGCATTATTr CATCTTTAGA AGGTX™ TCCTTXJCCT 

SraS Ess SSSS SSSS 

TATTTGGGTG AAACCCATXT ATATGAAA^ * ATGrG CCTCT CGGGCACATA 

SSSS SSSS = siss- ssss 
S33! S55S.SK K5S5 j™; ™S 

1741 GGGAGCTCCA ^™ TATACCCGTG SSSS SgCACTTAA 

1801 ATAGTGGTTG TAAATGGGCC GGACCGGGCC GGCCCAGCAG tGATTTAAAG 

1861 TATOA ^ SSSaa gSSSS CTAAAGAAAC CGGTCCCGGG 

SS SSSS SS£££ SSOCATC CTTTAAACTC GAGTCJAGTC 

^^-TAGT aacatagatg acaccgcgcg cgataattta tcctagtttc cgcgciatat 
££E53 aStotataa t^cgggact ctaatcataa a^cccatct 

Zt™twlZn rTTMravTT ACATGTTAAT TATTACATGC TTAAOGTAAT TCAACAGAAA 
SSSS SSSS ScGGCAAC AGGATTCAAT C^AAGAAAC ^~ 

SSSS SSSS SSSS SSSS SSSS sss* 

SSSS SSSS A = GC -GCGCAOG ~GG 
CTGCTCGCCG ATCTCGGTCA «GOTMCCC GGAGGCGTCC . ^ CCAGGGTGTT 
CTCCGACCAC TCGGCGTACA GCTCGICCAG ^CCGCG^ GTCACGTCGT CCCGGACCAC 
GTCCGGCACC ACCTGGTCCT GGACCGCGCT -„„™ rT( ™ TCCAGAACTC 

ACCGGCGAAG TCGTCCTCCA CGAAGTCCOG GGAGAACCCG Jg^^ ACTTGGCCAT 
GACCGCTCCG GCGACGTCGC GCGCGGTGAG CAC^GAACG GCACTGGTCA JCTTGGCCA 
GGTGGCCCTC CTCACGTGCT ATTATTGAAG CMTTMCJQ SS5S S^CCCCG 
SSSS = cSScAGA, GCGTAAGGAG AAAATACCGC 
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Figure 8 («**<» 



2S41 
3001 
3061 



~* rvrrrrCTTA ATTAAGTCGA CGTCCTCTCC AAATGAAATG 
ATCAGGCGAA ATTGTAAACG CGGCCGCTTA GGATTGTGCG TCATCCCTTA 

AACTTCCTTA TATAGAGGAA J££££J CGTGGTTGGA ACGTCTTCTT 

„„_ CGTCAGTGGA GATATCACAT CAATCCACTT ACCACTGTCG GCAGAGGCAT 

3121 TXTCCAOGTA GCTCCTCGTG GGrGGGGGTC TAGGTGCGAC CTTCCTTTTC 

3181 CTTGAACGAT AGCCTTTCCT TATCGCAATG AATCCGAGGA GGTTTCCCGA 

3241 TACTGTCCTT TTGATGAAGT GACA^AGC TATCTTTGAT 

3301 TATTACCCTT . TGTTGAAAAG TCTCAATAGC CCTTTGG £ ATGGGCAGAC 

3 61 ATTCTTGGAG TAGACGAGAG *^£GTGC ££££££ AAAACTCCCG TTITArTATA 

3421 CCGTCTGTAC TTTAAGAGTG TTGGCAACCA TGGCACGAAA AAACACACGC 

3481 TTTGATGAAT GCTGAAAGCT TACATTAATA ^KXKA TGCTCGGCGG GCAGAACATC 

3541 AAACAATACA GGGGGGTAGT OGGCGGGCGG CTAAGGGI^ GGGGGAGAAT 

3601 GAAAAATCAA GATCTATATG AArTACACTT ^TCCGTAGG AGGAAGCA^ TACTCCAGCT 

3661 ACCACMTCTC CCCCGGCGAC ATAATGTAAA GGCGAGGATX AXTGTAGGCT 

3721 GCCCTGGAGT CATTTCCTTC ATC^AATCTT AAATCCCTCT 

3781 TAGACrrCTT CTGCACCT^ TTCT^C CGGAATATCA TCTACSSATGT 

3841 GACAGCCAAC TAACTGTTTC ATTTTGCCAG TAATTATGAA 

3901 TGTAGATTGC GTCTTCGTTG TATG^GACC ^AA^ ^^^cs ATGTAGAGGC 

3961 CCCCTAGGCT TCTGGCCCAA GTAGAWTTC CGG^CTTGT 

4021 TCTGCTTTCT TGATCTTTCA TCTGATGACT JGATACAGAA OBgghCaM 

4081 AATTGCATCC TCGAGGGTAT ^CAGOTAGG "GAAU^ AMCAGGTGA 

4141 CTGGAAGATG TTAGGCTGGA GCCAATCGTT °ATTGACTCA TATTTGCAGA 

4201 GGAGGGTGGA TGAGOATTGG TGAACTCTTC AGCTCTTTCT GGATCATGGA 

4261 GTATTCAAAA IACTGCAATT TTQTGGMCA CGCATTATTT CATCTTTAGA 

4 ? 21 GAG ^^ SEES SttSagg AAGGGGGACT TCCTAGGAAT 

4381 AGGCTTTTTT TCCTTTACCT CTGAATCA^ ^ ATOTAATCCC TCACTCTGTT 

4441 GAAAGTACCT CTCTCAAACA CAGCCAGAGG ATATCARAGA ACCTTGAGTC 

4501 AACreACTTO ^CTCTGAA MGGGTG StGCAAAC TTCCATCITr 

4561 AGATATCCTT ATCGGCTTCT ^TJAAG S^JSaA CGAACGACGA GCTCCCAGAT 

4621 ATGTGCCTCT CGGGCACATA GAATATATTT GGGAAT^AA j^cGTGTTAGC 

4681 CATCTGACAG GCGATTTCAG GAOTCTGG ^TTTW^ ACGGAGGTTG 

4741 GTTCCTGIGT GAGAACTGAC GGT^GATGA GGAGGAGGCC CGCCTTCGAA 

4801 AGGCTGABGG ATOGCAGACT ^5 SaCCGGGCC GGCCCAGCAG 

4861 ATCCGCOGCT CCATTCTCTT ATAGTGGTTC J^J^, CTGCGAGGGC CCGGGGTAGG 

4921 GAAAAGAAGG Jj^J^?*^!?^ AGCCTGGTTC TGCTTTGTAT GATTTATCTA AAGCAGCCCA 

4981 GACCGAGCGC TTTGATTTAA AGCCTGGTTC AAGTGCGATT. CATTCATGGA 

5041 ATCTAAAGAA ACCGGTCCCG ££S5£ gSStA TTACAATTCA 

5101 TCCTTTAAAC TCGAGTCTAG AGGGCCCAAT GCGTTACCCA ACTTAATCGC 

5161 CTGGCCGTCG TTTTACAACG ^GTG^CTGG GAAAACCCTU CACOGATCGC 

5221 CTTGCAGCAC ATCCCCCTTT CGCCAGCTGG CGTAATAGCG JJ^^ CTATAARAG A 

5281 CCTTCCCAAC AGTTGCGCAG CCTATACGTA OGGCAGaix GC CGGGGCGA 

5341 GAGAGCCGTT ATCGTCTGTT TGTGGATGTA CAGAGTGATA CCGTGAACTT 

5401 CGGATGGTGA TCCCCCTGGC CAGTGCACGT CTGCTGTCAG TATGGCCAGT 

5461 TACCCGGTGG TGCATATCGG GGATGAAAGC TGQCGCATGA AAATGACATC 

5521 GTGCCGGTCT CCGTTATCGG ^AAGAAGTG GCTGATCTCA TGGCGAATGG 

5581 AAAAACGCCA TTAACCTGAT G^CTGGGGA JJATAAA ^ gcGXGACCGC 

5641 ACGCGCCCTG TAGCGGCGCA TTAAGCGCGC GGGTG^ TTCCCTTCCT TTCTCGCCAC 

5701 TACACTTGCC AGCGCCCTAG CGCCCGCTCC CCTTTAGGGT TCCGATTTAG 

5761 GTTCGCCGGC rCTCCCCGTC AAGCTCTAAA GATGGTTCAC GTAGTGGGCC 

5821 AGCTTTACGG CACCTCGACC CCAAAAAACT ^ A ™^^ TCCACGTTCT TTAATAGTGG 

5881 ATCGCCCTGA TAGACGGTTT TTOBCCCTTT GACGTTGGAG TCUA 

59 41 ACTCTTGTTC CAAA^GGAA ^CACTCAA CCCTATCGCG ^^A 

6001 AGGGATGTTG CCGATTTCGG CCTATTGQTT GATGCGCTGC GAATCGGGAG 

6061 ACAAAATTCA GAAGAACTCG TCAAGAAGGC ^TAGAAGGC TCTTCAQCAA 

6121 CGGCGATACC GTAAAGCACG AGGAAGCGGT ^^JJ^ ^CACCCAGC CGGCCACAGT 

6181 TATCACGGGT AGCCAACGCT ATGTCCTGAT JCCGGTCCGC « GCATCGCCAT 

•6241 CGATGAATCC AGAAAAGCGG C^TTTTCCA fCATGATATT ™ AACAGTTCGG 



6301 



CGATGAATCC AGAAAAGCGG CCATTTTCCA CCAiWi^ oagCCTGGCG AACAGTTCGG 
GGGTCACGAC GAGATCCTCG CCGTCGGGCA TGCTCGCCTT GAGCCTGGCG A* 
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Figure 8 (cont'd) 

.prTTCCTCCA GATCATCCTG ATCGACAAGA OOGGCTTOCA 

6361 CTGGCGOGAG CCCCTGATGC TCTTCGTCCA GATU^ GTCGAATGGG CASGTAGCCG 

M2 1 TCCGAGTACO TGCTCGCTCG MGCGATGTT ^^c^c TCGGCAGGAG 

6481 GATCAAGCGT AIGCAGCCGC CGCATTGCAT CAATAGCAGC CAGTCCCTTC 

65 41 CAAGGTGAGA TGACAGGAGA ^CTGCCCCG <^£g£L GCC CGTCGTG GCCAGCCACG 

6601 CCGCTTCAGT ^AACGTCG ^SgGCACC GGACAGGTCG GTCTTGACAA 

6661 ATAGCCGCGC ^CCTCG^ TGCAGCTCAT GGCATCAGAG CAGCCGATTG 

6721 AAAGAACCGG GCX3CCCCTGC GCTGACAGCC GGAACAC^ AGCGGCCGGA GAACCTGCGT 

6781 TCTGTTGTGC CCAGTCATAG CCGAATftGCC TCTCUAU ^^^^ TCAGATCTTG 

6841 GCAATCCATC ^"CAATC ^CGAA^CG JJ^^C ^TTrACT ^GCAGGGCT 

6901 ATCCCCTGCG CCATCAGATC CCTGGCGGCG J^JJ^ TTCGCTTGCT GTCCATAAAA 

6961 TCCCAACCTT ACCAGAGGGC GCCCCAGCTG GCAATTCCG« CTCTTTGCGC 

7021 CCGCCCAGTC TTkGCTATCGC CATGTAAGCC ^TGCAAGC ^^c^ 
TTOCGTTTTC CCTTGTCCAG ATAGCCCAGT AGCTGAC^vri GATAATCTCA 

TGACCAAAAT CGCT^ACGT £^ .^^^re CAAACAAAAA 

TCAAAGGATC TTCTTCAGAf CCOTCTC GCTACCAACT CTmTCCGA 

™SgI SSSS ^Sctgt ccttctagtg xagcogtaot 

AGGTAACTGG CTTCAGCAGA GCGCAGATA^- . cctcgctcTC CTAATCCTGT 

taggccacca cttcaagaac tctgtagcac cscc^a C0GOTTOGAC tca&GACGAT 

^CCAGTGGC ^CTGCCAGT ^CGATAAGT OGTGTCTXAC ^^^^ q^jcc(3VGCT ~ 
AOTTACCGGA TAAGGCGCAG CGOTOGGGCT ^"L TGAGCTATGA GAAAGCGCCA 

TGGAGCGAAC GACCTACACC ^TGAGAT ACCTACAGCG ^ 
CGCTTCCCGA AGGGAOAAAG GCGGACAGGT ATCCGGTAAG grcgs,^ 
AGCGCACGAG GGAGCTTCCA gatGCTCGTC" AGGGGGGCGG AGCCTATGGA 

S= -SSSS cSS SSSS GTCAG^GC GAGGAAGCGO 
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7261 
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7381 
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Figure 9 



pMSVLSB-6: 7404 bp; 

1794 C- 1835 G; 1936 T; 0 OTHER 
Composition 1839 A, 1794 C. a8 t 0%OTfIER 

percentage: 25% A; 24* c, 

Molecular Weight OcDa) : ssDHA: 2286.33 dsDNA: 4564.5 

ORIGIN r^rrwrCCO! CGCGTTGGCC GATTCATTAA TGCAGCTGGC 

1 AGCGCCCAAT ACGCAAACCG CCTCTCCCCG CGCAATTAAT GXGAGTTAGC 

61 ACGACAGGTT TCCCGACTGG ^GCGGGCA Jg*^^ gGCTCGTAIG TIOTGTGGAA 

121 TCACTCATTA GGCACCCCAG GCTTTACACT £ATG<£rcC QCCAAGCTAT 
181 TTGTGAGCGG ATAACAATTT qcatcaagct TGGTACCGAG CTOGGATCCA 

241 TTAGGTOACA CTATAGAATA CT^GCTAT ^CA^C vames a 
301 CTAGTAACGG COGCCAGXGT GCTGGAATTC nTGATGAAT GCTOAAAGCT 

361 TTGGCRACCA GTAATGAMA AAAACTCCCG ™'^H\ aAACAATACA GGGGGGttftGT 
III TACATTAATA TGTCQTGCGA ^CgGAAA AAACACA^C JJJCAA 
481 CGGCGGGCGO CTAAGGGTGG T^TCGGCGG ^JGAAUA^ ^c^^c 
541 AATTACACTT CCTCCGTAGG AGGAAGCACA GGGGGAUA^ GCCCTGGAGT CATTTCCTTC 
111 ATAATGTAAA TGACGCAGTT TGCCTCGAAR TACTCCAGCT « 

661 ATCCAATCTT CATCCGAGW GGCGA^ aaSScTCT GACAGCCAAC TAACTGTTTC 
721 TTCTTCZTAC CATACTTGGG GT^CAATG JAATCCCTCT GTCTTCGrrG 
781 CAACAAGGAC AGAATTTAAA CGGMTATCA £^ ccccTAGGCT tCTGGCCCAA 

841 TATGAAGACC ^TCAACATT I^S^cSc TCTCCTTTCT TOATCTO 

501 GTAGATTTTC CGG™TGT JSSxCMA AATTGCATCC TCGAGGGOAT 

951 TCTGATGACT GGATACAGAA TCCATCCMT CTGGAAGATG TTAGGCTGGA 

1021 AACAGGTAGG XTGAAGGAGC ^AAGCTT ^GA^ OGAGGGTGGA TGAGGATTGG . 
GCCAATCGTT GATTGACTCA ^OU^OTA ^J^flA GTATTCAAAA .TACTGCAATT 
TGAACTCTTC CTGAAXCTCA ^J^CT GAGGTACTCT TCTITOGAGG 

TTGTGGACCA ATCAAAGGG3 AGCTCTTTCT ™^ ^ AGGCTTTTTT TCCTTTACCT 
TAGOGTGTGA AATAATOTCT ^TTATTT ^c^ccT CTCTCAAACA 

CTGAATCAGA TOTCCTAGG AAGGGGGACT JCCTAGGA^ ^ qcACTCTGAA 

CAGCCAGAGG TTCCTIGAGA ATOTAATCCC JCACT^v aGATATCCTT ATCGGCTTCT 
TATTTGGGTG AAACCCATTT ATATCAAAGA ^TTGAGTC ^ CGGGCACATA 
CTGGCT3AAG CAATGCATGT AARTGCAAAC TTCCATCTTT ^ QOGATTTCAQ 
GAATATATTT GGGAATCCAA CGAACGACGA GCTCCCAGAT GAGAACTGAC 
OATTTTCTGG ACACTTTGGA TAGGTTAGGA JCGTGTTAGC ATC GCAGACT 

ggttcgatoa ggaggaggcc atagccgacg JCGGaggttg J ^ CCATO3Tcrr 

GGGAGCTCCA AACTCTATAG TATACCCGTG GAAAAGAAGG CGCGCACTAA 

ATAGTGGTTG TAAATGGGCC GGACCGGGCC GGCCCAGC^ ^ jqaTTTAAAG 

TATTACCGCG CCTTCTTTTC CTGCGAGGGC CCGGTA«*£ CTAAAGAAAC CGGTCCCGGG 
CCTGGTTCTG CTTTGTATGA TTTATCTAAA «WgCCAAT GAGTCfAGTC 
CACTATAAAT TGCCTAACAA GTGCGATTCA ^ATGGATC ^JJ^ cqCGCTATAT 
CCGATCTAGT AACATAGATG JCACCG«^ ££££££ SSSSa AAACCCA^T 
TTTGTTTTCT ATCGCGIATT AAATGTATAA ^ TTAACGTAAT TCAACAGAAA 

CATAAATAAC GTCATGCATT ACATGTTAAT TATTACATGC ^ TTTATTGCCA 
TTATATGATA ATCATCGACA GACCGGCAAC ™££££L GCGGCCGCTT AATTAAGTCG 
AATGTTTGAA CGATCGGGGA AATTCGCTCG JGTJ^™ AGGGTCTTGC GAAGGATAGT 
ACGTCCTCTC CAAATGAAAT GAACTTCCTT ^ATAGABGA «^ tgCTTTCAAG 
GGGATTGTGC GTCATCCCTT ACGTCAGTGG JGATATCACA XCAATCCACT T^^, 
ACGTGGTTGG AACGTCTTCT TTTTCCACGT J£CTCCTC^ gatGGCATTT 
GACCACTGTC GGCAGAGGCA TCTTGAACGA TGACAGATAG CTGGGCAATG 

GTAGGTGCCA CCTTCCUTT CTACTGTCCT gtCTCAATAG CCCTTTGGTC 

GAATOCGAGG AGGTTTCCCG ATATTACCCT ^TTCAAAA ^ CT CCACCATG 

TTCTGAGACT GTATCTTTGA TATTCTTGGA GXAGACGAGA GAGTGTCGTG ^ 
TTGACGAATT CATGGGCAGA CCCGTCTGTA CTTTOAGAGT . JgJ ATGTCGTGCG 
AAAAACTCCC GTTTTATTAT ATTTGATGAA TGCTGAAAGP TTACATTAA x » 



108 
114 
120 
126 
132 
13 B 
144 
150 
156: 
162 
168 
174 
180 
186 
192 
198 
204 
210 
216 
222 
228 
234 
240 
246 
252 
258 
264 
270 
276 
282 



9a/9 



WO 01/77350 



PCT/US01/11436 



Figure 9 (cont'd) 



2 881 ATGGCACGAA AAAACACAC0 CAAACAATAC AGGGGGGTAG TCGGCGGGCG GCTAAGGGTG 
2941 - GTGCTCGGCG GGCAGAACAT CGAAAAATCA AGATCTATAT GAATTACACT TCCTCCGTAG 
3001 GAGGAAGCAC AGGGGGAGAA TACCACTTCT CCCCCGGCGA CATAATGTAA ATCACGCAGT 

3 061 TTGCCTCGAA ATACTCCAGC TGCCCTGGAG TCATTTCCTT CATCCAATCT TCATCCGAGT 
3121 TGGCGAGGAT TATTGTAGGC TTAGACTTCT TCTGCACCTT TTTCTTCTTA CCATACTXGG 
3181 GGTTTACAAT GAAATCCCTC TGACAGCCAA CTAACTGTTT CCAACAAGGA CAGAATTTAA 
3241 ACGGAATATC ATCTAC6ATG TTGTAGATTG CGTCTTCGTT GTATGA AGAC CAATCAACftT 
3301 TATTTTGCCA GTAATTATGA ACCCCTAGGC TTCTGGCCCA AGTAGATTTT CCGGTTCTTG 
33 61 TTGGGCCGAC GATGTAGAGG CTCTGCTTTC TTGATCTTTC ATCTGATGAC TGGATACAGA 
3421 ATCCATCCAT TGGAGGTCAG AAATTGCATC CTCGAGGGTA TAACAGGTAG GTTGAAGGAG 
3481 CATCTAAGCT TCGGGACTAA CCTGGAAGAT GTTAGGCTGG AGCCAATCGT TGATTGACTC 
3541 ATTACAAAGT AAATCAGGTG AGGAGGGTGG ATGAGGATTG GTGAACTCTT CCTGAATCTC 
3601 AGGAAAAAGC TTATTTGCAG AGTATTCAAA ATACTGCAAT TTTGTGGACC AATCAAAGGG 
3661 GAGCTCTTTC TGGATCATGG AGAGGTACTC TTCTTTGGAG GTAGCGTGTG AAATAATGTC 
3721 TCGCATTATT TCATCTTTAG AAGGCTTTTT TTCCTTTACC TCTGAATCAG ATTTTCCTAQ 
3-781 GAAGGGGGAC TTCCTAGGAA TGAAAGTACC TCTCTCAAAC ACAGCpAGAG GTTCCTTGAG 
3841 AATGTAATCC CTCACTCTGT TAACTGACTT GG CACTCTGA* ATATTTGGGT GAAACCCATT 
3501 TATATCAAAG AACCTTGAGT CAGATATCCT TATCGGCTTC TCTGGCTGAA GCAATGCATG 
3961 TAAATGCAAA CTTCCATCTT TATOTGCCTC TGGGGGACAT AGAATATATT TGGGAATCCA 
4021 ACGAACGACG AGCTCCCAGA TCATCTGACA GGCGATTTCA GGATTTTCTG GACACTTTOG 
4081 ATAGGTTAGG AACGTGTTAG CGTTCCTGTG TGAGAACTGA CGGTTGGATG AGGAGGAGGC 
4141 GATAGCCGAC GACGGAGGTT GAGGCTGAGG GATGGCAGAC TGGGAGCTCC AAACTCTATA 
4201 GTATACCCGT GCGCCTTCGA AATCCGCCGC TCCATTGTCT TATAGTGG1T GTAAATOGGC 
4261 CGGACCGGGC CGGCCCAGCA GGAAAAGAAG GCGCGCACTA ATATTACCGC GCXTTTCTTTT 
4321 CCTGCGAGGG CCCGGGGTAG GGACCGAGCG CTTTGATTTA AAGCCTGGTT CTGCTTTGTA 
,4381 TGATTTATCT AAAGCAGCCC AATCTAAAGA AACCGGTCCC GGGCACTATA AATTGCCTAA 
4441 CAAGTGCGAT TCATTCATGG ATCCTTTAAA CTCGAGTCTA GAGGGCCCAA TTCGCCCTAT 
4501 AGTGAGTCGT ATTACAATTC ACTGGCCGTC GITTTACAAC GTOGTGACTO GGAAAACCCT 
4561 GGCGTTACCC AACTTAATOG . CCTTGCAGCA CATCCCCCTT TCGCCAGCTG GCGTAATAGC 
4621 GAAGAGGCCC GCACCGATCG CCCTTCCCAA CAGTTGCGCA GCCTATACGT ACGGCAGTTT 
4681 AAGGTTTACA CCTATAAAAG AGAGAGCCGT TATCGTCTGT TTGTGGATGT ACAGAGTGAT 
4741 ATTATTGACA CGCCGGGGCG ACGGATGGTG ATCCCCCTGG CCAGTGCACG TCTGCTGTCA 
4801 GATAAAGTCT CCCGTGAACT TTACCCGGTG GTGCATATCG GGGATGAAAG CTGGOGCATG 
4B61 ATGACCACCG ATATGGCCAG TGTGCCGGTC TCCGTTATCG GGGAAGAAGT GGCTGATCTC 
4921 AGCCACCGCG AAAATGACAT CAAAAACGCC ATTAACCTGA TGTTCTGGGG AATATAAATG 
4981 TCAGGCCTGA . ATGGCGAATG GACGCGCCCT GTAGCGGCGC ATTAAGCGCG COGG frG TGGT 
5041 GGTTACGCGC AGCGTGACCG CTACACTTCC CAGCGCCCTA GCGCCCGCTC CTTTCGCTTT 
5101 CTTCCCTTCC TTTCTCGCCA CQTTCOCCGQ CTTTCCCCGT CAAGCTCTAA ATCGGGGGCT 
5161 CCCTTTAGGG TTCCGATTTA GAGCTTTACG GCACCTCGAC CGCAAAAAAC CTGATTTGGG 
5221 TGATGGTTCA CGTAGTGGGC CATCGCCCTG ATAGACGGTT TTTOGCGCTT TGACGTTGGA 
5281 dTCCACGTTC TTTAATAGTG GACTCTT6TT CCAAACTGGA ACAACACTCA ACCCTATCGC 
5341 GGTCTATTCT TTTGATTTAT AAGGGATGTT GCCGATTTCG GCCTATTGGT TAAAA^ATGA 
5401 GCTGATTTAA CAAAAATTTT AACAAAATTC AGAAGAACTC GTCAAGAAGG CGATAGAAGG 
54 61 CGATGCGCTG CGAATCGGGA GCGGCGATAC CGTAAAGCAC GAGGAAGCGG TCAGCCCATT 
5521 CGCCGCCAAG CTCTTCAGCA ATATCACGGG TAGCCAACGC TATGTCCTGA TAGCGGTCCG 
5581 CCACACCCAG CCGGCCACAG TCGATGAATC CAGAAAAGCG GCCATTTTCC ACCATGATAT 
5641 TCGGCAAGCA GGCATCGCCA TGGGTCACGA CGAGATCCTC GCCGTCGGGC ATGCTCGCCT ■ 
5701 TGA60CTGGC GAACAGTTCG GCTGGCGCGA GCCCCXGATG CTCTTCGTCC AGATCATCCT 
5761 GATCGACAAG ACOGGCTTCC ATCCGAGTAC GTG CTCGCTC GATGCGATGT TTCGCTTGGT 
5821 GGTCGAATGG GCAGGTAGCC GGATCAAGCG TATGCAGCCG CCGCATTGCA TCAGCCATGA 
5881 TGGATACTTT CTCGGCAGGA GCAAGGTGAG ATGACAGGAG ATCCTGCCCC GGCACTTCGC 
5941 CCAATAGCAG CCAGTCCCTT CCCGCTTCAG TGACAACGTC GAGCACAGCT GOGCAAGGAA 
6001 CGCCCGTCGT GGCCAGCCAC GATAGCCGCG CTGCCTCGTC TTGCAGTTCA TTCAGGGCAC 
6061 CGGACAGGTC GGTCTTGACA AAAAGAACCG GGCGCCCCTG CGCTGACAGC CGGAACACGG 
6121 CGGCATCAGA GCAGCCGATT GTCTGTTGTG CCCAGTCATA GCCGAATAGC CTCTCCACCC 
6181 AAGCGGCCGG AGAACCTGCG TGCAATCCAT CTTGTTCAAT CATGCGAAAC GATCCTCATC 
6241 CTGTCTCTTG ATCAGATCTT GATCCCCTGC GCCATCAGAT CCTTGGCGGC GAGAAAGCCA 
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Figure 9 (coord) 

„ n1 TCCAGTTTAC TTTOCAGGGC CTCCCAACCT TACCAGAGGG CGCCCCAGCT GGCAATTCCG. 
£5 GTTCGCTTGC TGTCCATAAA ACCGCCCAGT CTAGCTATCG CCATGTAAGC CCACTGCAAG 
till SStTGCG CTTCCGTTTT CCClTOTCCA GATAGCCCAG TAGCTGACAT 

Al, tcaSaccgt TTCTCCGGAC tcgctttgta cgtgaaaagg atctaggtga 

SSSSS atgaccaaaa tcc™cg £££££ -cca^J 

~* /v-reAGACCC CGTAGAAAAG ATCAAAGGAT CTTCTTGAGA TCCTTTTTTT CTCOGCGTAA 

£ « SS^S Saacaaaa aaagcaccgc taccagcggt ggtttgtttg ccggatcaag 

6661 ?52S22J SttTCCG AAGGTAACTC GCTTCAGCAG AGCGCAGATA CCAAATACTG 
™f TCCTTCTAGT GTAGCCOTAG TTAGGCCACC ACITCAAGAA CTCTGTAGCA CCGGCTACAT 
f Z!, InSSSS SaMCCTG SaCCAGTGG CTCCTCCCAG TGGCGATAAG TCGTGTCTTA 

I J£ SSSSgS ctcaagacga tagttaccgg ataaggcgca gcggtcgggc tgaacggggg 
till SSSSS? SScSS? ttggagcgaa cgacctacac cgaactgaga tacctacagc 
SgSSSS aSaagcgcc- acgcttcccg aagggagaaa ggcggacagg tatccggtaa 

III 1 , S^S? JSaACAGGA GAGCGCACGA GGGAGCrrcC AGGGGGAAAC GCCTGGTATC 

SSSgS CGCCACCKrr gacttgagcg tcgatttttg tgatgctcgt 
Si? SSScg gagcctatog aaaaacgcca gcaacgcggc ctttttacgg jtcctqggct 

llll SSSSS SSctCAC ATGTiCTTTC CTOCGTTATC CCCTGATTCT GTGGATAACC 

S£ SSSS? SSgSS^ gc^ataco, ctcgccgcag ccgaacgacc gagcgcagcg 
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SEQUENCE LISTING 

<110> LARGE SCALE BIOLOGY CORPORATION 

<120> COMPOSITIONS AND METHODS FOR INHIBITING 
GENE EXPRESSION 

<130> 008010177PCOO 

<14 0> To Be Assigned 
<141> 2001-04-04 

<150> 09/545,574 
<151> 2000-04-07 

<160> 14 

<170> FastSEQ for Windows Version 3 . 0 

<210> 1 
<211> 27 
<212> DNA 

<213> Cauliflower mosaic virus 
<400> 1 

tttgaattcg tcaacatggt ggagcac 

<210> 2 
<211> 31 
<212> DNA 

<213> Cauliflower mosaic virus 
<400> 2 

tttgtcgacg tcctctccaa atgaaatgaa c 

<210> 3 
<211> 46 
<212> DNA 

<213>. Artificial Sequence 
<220> 

<223> zeocin resistance gene 
<4 00> 3 

cccgtcgact taattaagcg gccgcgttta caatttcgcc tgatgc 
<210> 4 

<211> 47 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> zeocin resistance gene 
<400> 4 

cccctcgagt taattaagcg gccgcctcaa aaaggatctt cacctag 
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<210> 5 
<211> 32 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> nopaline synthase gene (nos) terminator sequence 
<400> 5 

tttctcgagc gaatttcccc gatcgttcaa ac 



<210> 6 

<211> 32 

<212> DNA 

<213> Artificial Sequence 



<220> 

<223> nopauline synthase (nos) terminator sequence 

<400> 6 

tttactagtc ccgatctagt aacatagatg ac ' 

<210> 7 

<211> 29 

<2 12 > DNA ; 

<213> maize 



<400> 7 

tttttaatta aggtccgcct gaattctcg 

<210> 8 
<211> 30 
<212> DNA 
<213> maize 

<400> 8 

tttttaatta acggcaaggc tcacagtttg 



<210> 9 
<211> 4881 
<212> DNA 

<213> Viral s 
<400> 9 

agcgcccaat acgcaaaccg cctctccccg cgcgttggcc gattcattaa tgcagctggc 

acgacaggtt tcccgactgg aaagcgggca gtgagcgcaa cgcaattaat gtgagttagc 120 

tcactcatta ggcaccccag gctttacact ttatgcttcc ggctcgtatg ttgtgtggaa 180 

ttgtgagcgg ataacaattt cacacaggaa acagctatga ccatgattac gccaagctat 240 

ttaggtgaca ctatagaata ctcaagctat gcatcaagct tggtaccgag ctcggatcca 3 00 

ctagtaacgg ccgccagtgt gctggaattc atgggcagac ccgtctgtac tttaagagtg 3 60 

ttggcaacca gtaatgaata aaaactcccg ttttattata tttgatgaat gctgaaagct 420 

tacattaata tgtcgtgcga tggcacgaaa aaacacacgc aaacaataca ggggggtagt 4 80 

cggcgggcgg ctaagggtgg tgctcggcgg gcagaacatc gaaaaatcaa gatctatatg 540 

aattacactt cctccgtagg aggaagcaca gggggagaat accacttctc ccccggcgac 600 

ataatgtaaa tgacgcagtt tgcctcgaaa tactccagct gccctggagt catttccttc - 660 

atccaatctt catccgagtt ggcgaggatt attgtaggct tagacttctt ctgcaccttt 720 



60 
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ttcttcttac catacttggg gtttacaatg aaatccctct gacagccaac taactgtttc 780 

caacaaggac agaatttaaa cggaatatca tctacgatgt tgtagattgc gtcttcgttg 840 

tatgaagacc aatcaacatt attttgccag taattatgaa cccctaggct tctggcccaa 900 

gtagattttc cggttcttgt tgggccgacg atgtagaggc tctgctttct tgatctttca 960 

tctgatgact ggatacagaa tccatccatt ggaggtcaga aattgcatcc tcgagggtat 1020 

aacaggtagg ttgaaggagc atgtaagctt cgggactaac ctggaagatg ttaggctgga 1080 

gccaatcgtt gattgactca ttacaaagta aatcaggtga ggagggtgga tgaggattgg 114 0 

tgaactcttc ctgaatctca ggaaaaagct tatttgcaga gtattcaaaa tactgcaatt -12 00 

ttgtggacca atcaaagggg agctctttct ggatcatgga gaggtactct tctttggagg 12 60 

tagcgtgtga aataatgtct cgcattattt catctttaga aggctttttt tcctttacct 1320 

ctgaatcaga ttttcctagg aagggggact tcctaggaat gaaagtacct ctctcaaaca 13 80 

cagccagagg ttccttgaga atgtaatccc tcactctgtt aactgacttg gcactctgaa 1440 

tatttgggtg aaacccattt atatcaaaga accttgagtc agatatcctt atcggcttct 1500 

ctggctgaag caatgcatgt aaatgcaaac ttccatcttt atgtgcctct cgggcacata 1560 

gaatatattt gggaatccaa cgaacgacga gctcccagat catctgacag gcgatttcag 1620 

gattttctgg acactttgga taggttagga acgtgttagc gttcctgtgt gagaactgac 1680 

ggttggatga ggaggaggcc atagccgacg acggaggttg aggctgaggg atggcagact 174 0 

gggagctcca aactctatag tatacccgtg cgccttcgaa atccgccgct ccattgtctt 1800 

atagtggttg taaatgggcc ggaccgggcc ggcccagcag gaaaagaagg cgcgcactaa 1860 

tattaccgcg ccttcttttc ctgcgagggc ccggtaggga ccgagcgctt tgatttaaag 1920 

cctggttctg ctttgcggcc gctcgagcat gcatctagag ggcccaattc gccctatagt 1980 

gagtcgtatt acaattcact ggccgtcgtt ttacaacgtc gtgactggga aaaccctggc 2040 

gttacccaac ttaatcgcct tgcagcacat ccccctttcg ccagctggcg taatagcgaa 2100 

gaggcccgca ccgatcgccc ttcccaacag ttgcgcagcc tatacgtacg gcagtttaag 2160 

gtttacacct ataaaagaga gagccgttat cgtctgtttg tggatgtaca gagtgatatt 2220 

attgacacgc cggggcgacg gatggtgatc cccctggcca gtgcacgtct gctgtcagat 2280 

aaagtctccc gtgaacttta cccggtggtg catatcgggg atgaaagctg gcgcatgatg 2 340 

accaccgata tggccagtgt gccggtctcc gttatcgggg aagaagtggc tgatctcagc 2400 

caccgcgaaa atgacatcaa aaacgccatt aacctgatgt tctggggaat ataaatgtca 2460 

ggcctgaatg gcgaatggac gcgccctgta gcggcgcatt -aagcgcgcgg gtgtggtggt 2520 

tacgcgcagc gtgaccgcta cacttgccag cgccctagcg cccgctcctt tcgctttctt 2580 

cccttccttt ctcgccacgt tcgccggctt tccccgtcaa gctctaaatc gggggctccc 2640 

tttagggttc cgatttagag ctttacggca cctcgaccgc aaaaaacttg atttgggtga 2 700 

tggttcacgt agtgggccat cgccctgata gacggttttt cgccctttga cgttggagtc 2760 

cacgttcttt aatagtggac tcttgttcca aactggaaca acactcaacc ctatcgcggt 2820 

ctattctttt gatttataag ggatgttgcc gatttcggcc tattggttaa aaaatgagct 2880 

gatttaacaa aaattttaac aaaattcaga agaactcgtc aagaaggcga tagaaggcga 2940 

. tgcgctgcga atcgggagcg gcgataccgt aaagcacgag gaagcggtca gcccattcgc 3 000 

cgccaagctc ttcagcaata tcacgggtag ccaacgctat gtcctgatag cggtccgcca 3060 

cacccagccg gccacagtcg atgaatccag aaaagcggcc attttccacc atgatattcg 312 0 

gcaagcaggc atcgccatgg gtcacgacga gatcctcgcc gtcgggcatg ctcgccttga 3180 

gcctggcgaa cagttcggct ggcgcgagcc cctgatgctc ttcgtccaga tcatcctgat 3240 

cgacaagacc ggcttccatc cgagtacgtg ctcgctcgat gcgatgtttc gcttggtggt 3300 

cgaatgggca ggtagccgga tcaagcgtat gcagccgccg cattgcatca gccatgatgg 3360 

atactttctc ggcaggagca aggtgagatg acaggagatc ctgccccggc acttcgccca 3420 

atagcagcca gtcccttccc gcttcagtga caacgtcgag cacagctgcg caaggaacgc 34 80 

. ccgtcgtggc cagccacgat agccgcgctg cctcgtcttg cagttcattc agggcaccgg 354 0 

acaggtcggt cttgacaaaa agaaccgggc gcccctgcgc tgacagccgg aacacggcgg 3600 

catcagagca gccgattgtc tgttgtgccc agtcatagcc gaatagcctc tccacccaag 3660 

cggccggaga acctgcgtgc aatccatctt gttcaatcat gcgaaacgat cctcatcctg 3 72 0 

tctcttgatc agatcttgat cccctgcgcc atcagatcct tggcggcgag aaagccatcc 3780 

agtttacttt gcagggcttc ccaaccttac cagagggcgc cccagctggc aattccggtt 3 84 0 

cgcttgctgt ccataaaacc gcccagtcta gctatcgcca tgtaagccca ctgcaagcta 3 900 

cctgctttct ctttgcgctt gcgttttccc ttgtccagat agcccagtag ctgacattca 3960 

tccggggtca gcaccgtttc tgcggactgg ctttctacgt gaaaaggatc taggtgaaga 4 02 0 

tcctttttga taatctcatg accaaaatcc ,cttaacgtga gttttcgttc cactgagcgt 4080 

cagaccccgt agaaaagatc aaaggatctt cttgagatcc tttttttctg cgcgtaatct 4140 
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gctgcttgca aacaaaaaaa ccaccgctac cagcggtggt ttgtttgccg gatcaagagc 4200 

taccaactct ttttccgaag gtaactggct tcagcagagc gcagatacca aatactgtcc 4260 

ttctagtgta gccgtagtta ggccaccact tcaagaactc tgtagcaccg cctacatacc 4320 

tcgctctgct aatcctgtta ccagtggctg ctgccagtgg. cgataagtcg tgtcttaccg 4380 

ggttggactc aagacgatag ttaccggata aggcgcagcg gtcgggctga acggggggtt 4440 

cgtgcacaca gcccagcttg gagcgaacga cctacaccga actgagatac ctacagcgtg 4500 

agctatgaga aagcgccacg cttcccgaag ggagaaaggc ggacaggtat ccggtaagcg 4560 

gcagggtcgg aacaggagag cgcacgaggg agcttccagg gggaaacgcc tggtatcttt 4620 

atagtcctgt cgggtttcgc cacctctgac ttgagcgtcg atttttgtga tgctcgtcag 4680 

gggggcggag cctatggaaa aacgccagca acgcggcctt tttacggttc ctgggctttt 4740 

gctggccttt tgctcacatg ttctttcctg cgttatcccc tgattctgtg gataaccgta 4800 

ttaccgcctt tgagtgagct gataccgctc gccgcagccg aacgaccgag cgcagcgagt 4 860 

cagtgagcga ggaagcggaa g 4881 

<210> 10 
<211> 3413 
<212> DNA 
<213> viral 



<400> 10 

agcgcccaat acgcaaaccg cctctccccg cgcgttggcc gattcattaa tgcagctggc 
acgacaggtt tcccgactgg aaagcgggca gtgagcgcaa cgcaattaat gtgagttagc 
tcactcatta ggcaccccag gctttacact ttatgcttcc ggctcgtatg ttgtgtggaa 
ttgtgagcgg ataacaattt cacacaggaa acagctatga ccatgattac gccaagctat 240 
ttaggtgaca ctatagaata ctcaagctat gcatcaagct tgggcccggt agggaccgag *™ 
cgctttgatt taaagcctgg ttctgctttg tatgatttat ctaaagcagc ccaatctaaa 
gaaaccggtc ccgggcacta taaattgcct aacaagtgcg attcattcat ggatccttta 
aactcgagtc tagagggccc gaattctgca gatatccatc acactggcgg ccgctcgagc 
atgcatctag agggcccaat tcgccctata gtgagtcgta ttacaattca ctggccgtcg 
ttttacaacg tcgtgactgg gaaaaccctg gcgttaccca acttaatcgc cttgcagcac 
atcccccttt cgccagctgg cgtaatagcg aagaggcccg caccgatcgc ccttcccaac 
agttgcgcag cctatacgta cggcagttta aggtttacac ctataaaaga gagagccgtt 
atcgtctgtt tgtggatgta cagagtgata ttattgacac gccggggcga cggatggtga 
. tccccctggc cagtgcacgt ctgctgtcag ataaagtctc ccgtgaactt tacceggtgg „ 
tgcatatcgg ggatgaaagc tggcgcatga tgaccaccga tatggccagt gtgccggtct 
ccgttatcgg ggaagaagtg gctgatctca gccaccgcga aaatgacatc aaaaacgcca ^960 

1080 
1140 
1200 
1260 



60 
120 
180 



300 
360 
420 
480 
540 
600 
660 
720 
780 
840 
900 



ttaacctgat gttctgggga atataaatgt caggcctgaa tggcgaatgg acgcgccctg 
tagcggcgca ttaagcgcgc gggtgtggtg gttacgcgca gcgtgaccgc tacacttgcc 
agcgccctag cgcccgctcc tttcgctttc ttcccttcct ttctcgccac gttcgccggc 
tttccccgtc aagctctaaa tcgggggctc cctttagggt tccgatttag agctttacgg 
cacctcgacc gcaaaaaact tgatttgggt gatggttcac gtagtgggcc atcgccctga 

tagacggttt ttcgcccttt gacgttggag tccacgttct ttaatagtgg actcttgttc 1320 

caaactggaa caacactcaa ccctatcgcg gtctattctt ttgatttata agggatgttg 1380 

ccgatttcgg cctattggtt aaaaaatgag ctgatttaac aaaaatttta acaaaattca 1440 

gaagaactcg tcaagaaggc gatagaaggc gatgcgctgc gaatcgggag cggcgatacc 1500 

gtaaagcacg aggaagcggt cagcccattc gccgccaagc tcttcagcaa tatcacgggt 1560 

agccaacgct atgtcctgat agcggtccgc cacacccagc cggccacagt cgatgaatcc 1620 

agaaaagcgg ccattttcca ccatgatatt cggcaagcag gcatcgccat gggtcacgac 1680 

gagatcctcg ccgtcgggca tgctcgcctt gagcctggcg aacagttcgg ctggcgcgag nAn 
cccctgatgc tcttcgtcca gatcatcctg atcgacaaga ccggcttcca tccgagtacg 
tgctcgctcg atgcgatgtt tcgcttggtg gtcgaatggg caggtagccg gatcaagcgt 

atgcagccgc cgcattgcat cagccatgat ggatactttc tcggcaggag caaggtgaga 1920 

tgacaggaga tcctgccccg gcacttcgcc caatagcagc cagtcccttc ccgcttcagt 1980 

gacaacgtcg agcacagctg cgcaaggaac gcccgtcgtg gccagccacg atagccgcgc 2040 

tgcctcgtct tgcagttcat tcagggcacc ggacaggtcg gtcttgacaa aaagaaccgg 2100 

gcgcccctgc gctgacagcc ggaacacggc ggcatcagag cagccgattg tctgttgtgc 2160 

ccagtcatag. ccgaatagcc tctccaccca agcggccgga gaacctgcgt gcaatccatc 2220 
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ttgttcaatc 
cca.tcagatc 
accagagggc 
tagctatcgc 
ccttgtccag 
ggctttctac 
cccttaacgt 
ttcttgagat 
accagcggtg 
cttcagcaga 
cttcaagaac 
tgctgccagt 
taaggcgcag 
gacctacacc 
agggagaaag 
ggagcttcca 
acttgagcgt 
caacgcggcc 
tgcgttatcc 
tcgccgcagc 



atgcgaaacg 
cttggcggcg 
gccccagctg 
catgtaagcc 
atagcccagt 
gtgaaaagga 
gagttttcgt 
cctttttttc 
gtttgtttgc 
gcgcagatac 
tctgtagcac 
ggcgataagt 
cggtcgggct 
gaactgagat 
gcggacaggt 

ggggg aaac 9 

cgatttttgt 
tttttacggt 
cctgattctg 
cgaacgaccg 



atcctcatcc 
agaaagccat 
gcaattccgg 
cactgcaagc 
agctgacatt 
tctaggtgaa 
tccactgagc 
tgcgcgtaat 
cggatcaaga 
caaatactgt 
cgcctacata 
cgtgtcttac 
gaacgggggg 
acctacagcg 
atccggtaag 
cctggtatct 
gatgctcgtc 
tcctgggctt 
tggataaccg 
agcgcagcga 



tgtctcttga 
ccagtttact 
ttcgcttgct 
tacctgcttt 
•catccggggt 
gatccttttt 
gtcagacccc 
ctgctgcttg 
gctaccaact 
ccttctagtg 
cctcgctctg 
cgggttggac 
ttcgtgcaca 
tgagctatga 
cggcagggtc 
ttatagtcct 
aggggggcgg 
ttgctggcct 
tattaccgcc 
gtcagtgagc 



tcagatcttg 
ttgcagggct 
gtccataaaa 
ctctttgcgc 
cagcaccgtt 
gataatctca 
gtagaaaaga 
caaacaaaaa 
ctttttccga 
tagccgtagt 
ctaatcctgt 
tcaagacgat 
cagcccagct 
gaaagcgcca 
ggaacaggag 
gtcgggtttc 
agcctatgga 
tttgctcaca 
tttgagtgag 
gaggaagcgg 



atcccctgcg 
tcccaacctt 
ccgcccagtc 
ttgcgttttc 
tctgcggact 
tgaccaaaat 
tcaaaggatc 
aaccaccgct 
aggtaactgg 
taggccacca 
taccagtggc 
agttaccgga 
tggagcgaac 
cgcttcccga 
agcgcacgag 
gccacctctg 
aaaacgccag 
tgttctttcc 
ctgataccgc 
aag 



2280 
2340 
2400 
2460 
2520 
2580 
2640 
2700 
2760 
2820 
2880 
2940 
3000 
3060 
3120 
3180 
3240 
3300 
3360 
3413 



<210> 11 , 
<211> 4961 
<212> DNA 
<213> Viral 



<400> 
agcgcccaat 
acgacaggtt 
tcactcatta 
ttgtgagcgg 
ttaggtgaca 
ctagtaacgg 
ttggcaacca 
tacattaata 
cggcgggcgg 
aattacactt 
ataatgtaaa 
atccaatctt 
ttcttcttac 
caacaaggac 
tatgaagacc 
gtagattttc 
tctgatgact 
aacaggtagg 
gccaatcgtt 
tgaactcttc 
ttgtggacca 
tagcgtgtga 
ctgaatcaga 
cagccagagg 
tatttgggtg 
ctggctgaag 
gaatatattt 
gattttctgg 
ggttggatga 
gggagctcca 



11 

acgcaaaccg 
tcccgactgg 
ggcaccccag 
ataacaattt 
ctatagaata 
ccgccagtgt 
gtaatgaata 
tgtcgtgcga 
ctaagggtgg 
cctccgtagg 
tgacgcagtt 
catccgagtt 
catacttggg 
agaatttaaa 
aatcaacatt 
cggttcttgt 
ggatacagaa 
ttgaaggagc 
gattgactca 
ctgaatctca 
atcaaagggg 
aataatgtct 
ttttcctagg 
ttccttgaga 
aaacccattt 
caatgcatgt 
gggaatccaa 
acactttgga 
ggaggaggcc 
aactctatag 



cctctccccg 
aaagcgggca 
gctttacact 
cacacaggaa 
ctcaagctat 
gctggaattc 
aaaactcccg 
tggcacgaaa 
tgctcggcgg 
aggaagcaca 
tgcctcgaaa 
ggcgaggatt 
gtttacaatg 
cggaatatca 
attttgccag 
tgggccgacg 
tccatccatt 
atgtaagctt 
ttacaaagta 
ggaaaaagct 
agctctttct 
cgcattattt 
aagggggact 
atgtaatccc 
atatcaaaga 
aaatgcaaac 
cgaacgacga 
taggttagga 
atagccgacg 
tatacccgtg 



cgcgttggcc 
gtgagcgcaa 
ttatgcttcc 
acagctatga 
gcatcaagct 
atgggcagac 
ttttattata 
aaacacacgc 
gcagaacatc 
gggggagaat 
tactccagct 
attgtaggct 
aaatccctct 
tctacgatgt 
taattatgaa 
atgtagaggc 
ggaggtcaga 
cgggactaac 
aatcaggtga 
tatttgcaga 
ggatcatgga 
catctttaga 
tcctaggaat 
tcactctgtt 
accttgagtc 
ttccatcttt 
gctcccagat 
acgtgttagc 
acggaggttg 
cgccttcgaa 



gattcattaa 
cgcaattaat 
ggctcgtatg 
ccatgattac 
tggtaccgag 
ccgtctgtac 
tttgatgaat 
aaacaataca 
gaaaaatcaa 
accacttctc 
gccctggagt 
tagacttctt 
gacagccaac 
tgtagattgc 
cccctaggct 
tctgctttct 
aattgcatcc 
ctggaagatg 
ggagggtgga 
gtattcaaaa 
gaggtactct 
aggctttttt 
gaaagtacct 
aactgacttg 
agatatcctt 
atgtgcctct 
catctgacag 
gttcctgtgt 
aggctgaggg 
atccgccgct 



tgcagctggc 
gtgagttagc 
ttgtgtggaa 
gccaagctat 
ctcggatcca 
tttaagagtg 
gctgaaagct 

ggggggtagt 

gatctatatg 
ccccggcgac 
catttccttc 
ctgcaccttt 
taactgtttc 
gtcttcgttg 
tctggcccaa 
tgatctttca 
tcgagggtat 
ttaggctgga 
tgaggattgg 
tactgcaatt 
tctttggagg 
tcctttacct 
ctctcaaaca 
gcactctgaa 
atcggcttct 
cgggcacata 
gcgatttcag 
gagaactgac 
atggcagact 
ccattgtctt 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 
660 
720 
780 
840 
900 
960 
1020 
1080 
1140 
1200 
1260 
1320 
1380 
1440 
1500 
1560 
1620 
1680 
1740 
1800 
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atagtggttg taaatgggcc ggaccgggcc ggcccagcag gaaaagaagg cgcgcactaa 1860 

tattaccgcg ccttcttttc ctgcgagggc ccggtaggga ccgagcgctt tgatttaaag 1920 

cctggttctg ctttgtatga tttatctaaa gcagcccaat ctaaagaaac cggtcccggg 1980 

cactataaat tgcctaacaa gtgcgattca ttcatggatc ctttaaactc gagtctagag 2040 

ggcccaattc gccctatagt gagtcgtatt acaattcact ggccgtcgtt ttacaacgtc 2100 

gtgactggga aaaccctggc gttacccaac ttaatcgcct tgcagcacat ccccctttcg 2160 

ccagctggcg taatagcgaa gaggcccgca ccgatcgccc ttcccaacag ttgcgcagcc 2220 

tatacgtacg gcagtttaag gtttacacct ataaaagaga gagccgttat cgtctgtttg 2280 

tggatgtaca gagtgatatt attgacacgc cggggcgacg gatggtgatc cccctggcca 2340 

gtgcacgtct gctgtcagat aaagtctccc gtgaacttta cccggtggtg catatcgggg 24 00 

atgaaagctg gcgcatgatg accaccgata tggccagtgt gccggtctcc gttatcgggg 2460 

aagaagtggc tgatctcagc caccgcgaaa atgacatcaa aaacgccatt aacctgatgt 2520 

tctggggaat ataaatgtca ggcctgaatg gcgaatggac gcgccctgta gcggcgcatt 2580 

aagcgcgcgg gtgtggtggt tacgcgcagc gtgaccgcta cacttgccag cgccctagcg 2640 

cccgctcctt tcgctttctt cccttccttt ctcgccacgt tcgccggctt tccccgtcaa 2700 

gctctaaatc gggggctccc tttagggttc cgatttagag ctttacggca cctcgaccgc 2760 

aaaaaacttg atttgggtga tggttcacgt agtgggccat cgccctgata gacggttttt 2820, 

cgccctttga cgttggagtc cacgttcttt aatagtggac tcttgttcca aactggaaca 2880 

acactcaacc ctatcgcggt ctattctttt gatttataag ggatgttgcc gatttcggcc 2940 

tattggttaa aaaatgagct gatttaacaa aaattttaac aaaattcaga agaactcgtc 3000 

aagaaggcga tagaaggcga tgcgctgcga atcgggagcg gcgataccgt aaagcacgag 3060 

gaagcggtca gcccattcgc cgccaagctc ttcagcaata tcacgggtag ccaacgctat 3120 

gtcctgatag cggtccgcca cacccagccg gccacagtcg atgaatccag aaaagcggcc 3180 

attttccacc atgatattcg gcaagcaggc atcgccatgg gtcacgacga gatcctcgcc . 3240 

gtcgggcatg ctcgccttga gcctggcgaa cagttcggct ggcgcgagcc cctgatgctc 3300 

ttcgtccaga tcatcctgat cgacaagacc ggcttccatc cgagtacgtg ctcgctcgat 3360 

gcgatgtttc gcttggtggt cgaatgggca ggtagccgga tcaagcgtat gcagccgccg 3420 

cattgcatca gccatgatgg atactttctc ggcaggagca aggtgagatg acaggagatc 3480 

ctgccccggc acttcgccca atagcagcca gtcccttccc gcttcagtga caacgtcgag 3540 

cacagctgcg caaggaacgc ccgtcgtggc cagccacgat agccgcgctg cctcgtcttg 3600 

cagttcattc agggcaccgg acaggtcggt cttgacaaaa agaaccgggc gcccctgcgc 3660 

tgacagccgg aacacggcgg catcagagca gccgattgtc tgttgtgccc agtcatagcc 3720 

gaatagcctc tccacccaag cggccggaga acctgcgtgc aatccatctt gttcaatcat 3780 

gcgaaacgat cctcatcctg tctcttgatc agatcttgat cccctgcgcc atcagatcct. 3840 

tggcggcgag aaagccatcc agtttacttt gcagggcttc ccaaccttac cagagggcgc - "3900 

cccagctggc aattccggtt cgcttgctgt ccataaaacc gcccagtcta gctatcgcca 3960 

tgtaagccca ctgcaagcta cctgctttct ctttgcgctt gcgttttccc ttgtccagat 4020 

agcccagtag ctgacattca tccggggtca gcaccgtttc tgcggactgg ctttctacgt 4080 

gaaaaggatc taggtgaaga tcctttttga taatctcatg accaaaatcc cttaacgtga 414 0 

gttttcgt'tc cactgagcgt cagaccccgt agaaaagatc aaaggatctt cttgagatcc 42 00 

tttttttctg cgcgtaatct gctgcttgca aacaaaaaaa ccaccgctac cagcggtggt .4260- 

ttgtttgccg gatcaagagc taccaactct ttttccgaag gtaactggct tcagcagagc 4320 

gcagatacca aatactgtcc ttctagtgta gccgtagtta ggccaccact tcaagaactc 43 80 

tgtagcaccg cctacatacc tcgctctgct aatcctgtta ccagtggctg ctgccagtgg 4440 

cgataagtcg tgtcttaccg ggttggactc aagacgatag ttaccggata aggcgcagcg 4500 

gtcgggctga acggggggtt cgtgcacaca gcccagcttg gagcgaacga cctacaccga 4560 

actgagatac ctacagcgtg agctatgaga aagcgccacg cttcccgaag ggagaaaggc 462 0 

ggacaggtat ccggtaagcg gcagggtcgg aacaggagag cgcacgaggg agcttccagg 4680 

gggaaacgcc tggtatcttt atagtcctgt cgggtttcgc cacctctgac ttgagcgtcg 4740 

atttttgtga tgctcgtcag gggggcggag cctatggaaa aacgccagca acgcggcctt 4800 

tttacggttc ctgggctttt gctggccttt tgctcacatg ttctttcctg cgttatcccc 4 860 

tgattctgtg gataaccgta ttaccgcctt tgagtgagct gataccgctc gccgcagccg 4920 

aacgaccgag cgcagcgagt cagtgagcga ggaagcggaa g 4 961 

<210> 12 
<211> 6309 
<212> DNA 
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<213> Viral 
<400> 12 

agcgcccaat acgcaaaccg cctctccccg cgcgttggcc gattcattaa tgcagctggc 60 

acgacaggtt tcccgactgg aaagcgggca gtgagcgcaa cgcaattaat gtgagttagc 120 

tcactcatta ggcaccccag gctttacact ttatgcttcc ggctcgtatg ttgtgtggaa 180 

ttgtgagcgg ataacaattt cacacaggaa acagctatga ccatgattac gccaagctat 240 

ttaggtgaca ctatagaata ctcaagctat gcatcaagct tggtaccgag ctcggatcca 300 

ctagtcccga tctagtaaca tagatgacac cgcgcgcgat aatttatcct agtttgcgcg 360 

ctatattttg ttttctatcg cgtattaaat gtataattgc gggactctaa tcataaaaac 420 

ccatctcata aataacgtca tgcattacat gttaattatt acatgcttaa cgtaattcaa 480 

cagaaattat atgataatca tcgacagacc ggcaacagga ttcaatctta agaaacttta 540 

ttgccaaatg tttgaacgat cggggaaatt cgctcgagtt aattaagcgg ccgcctcaaa 600 

aaggatcttc acctagatcc ttttaaatta aaaatgaagt tttagcacgt gtcagtcctg 660 

ctcctcggcc acgaagtgca cgcagttgcc ggccgggtcg cgcagggcga actcccgccc 72 0 

ccacggctgc tcgccgatct cggtcatggc cggcccggag gcgtcccgga agttcgtgga 780 

cacgacctcc gaccactcgg cgtacagctc gtccaggccg cgcacccaca cccaggccag 840 

ggtgttgtcc ggcaccacct ggtcctggac cgcgctgatg aacagggtca cgtcgtcccg 900 

gaccacaccg gcgaagtcgt cctccacgaa gtcccgggag aacccgagcc ggtcggtcca 960 

gaactcgacc gctccggcga cgtcgcgcgc ggtgagcacc ggaacggcac tggtcaactt 1020 

ggccatggtg gccctcctca cgtgctatta ttgaagcatt tatcagggtt attgtctcat 1080 

gagcggatac atatttgaat gtatttagaa aaataaacaa ataggggttc cgcgcacatt 1140 

tccccgaaaa gtgccacctg tatgcggtgt gaaataccgc acagatgcgt aaggagaaaa 1200 

taccgcatca ggcgaaattg taaacgcggc cgcttaatta agtcgacgtc ctctccaaat 1260 

gaaatgaact tccttatata gaggaagggt cttgcgaagg . atagtgggat tgtgcgtcat 1320 

cccttacgtc agtggagata tcacatcaat ccacttgctt tgaagacgtg gttggaacgt 1380 

cttctttttc cacgtagctc ctcgtgggtg ggggtccatc tttgggacca ctgtcggcag 1440 

aggcatcttg aacgatagcc tttccttatc gcaatgatgg catttgtagg tgccaccttc 1500 

cttttctact gtccttttga tgaagtgaca gatagctggg caatggaatc cgaggaggtt 1560 

tcccgatatt accctttgtt gaaaagtctc aatagccctt tggtcttctg agactgtatc 1620 

tttgatattc ttggagtaga cgagagagtg tcgtgctcca ccatgttgac gaattcatgg 1680 

gcagacccgt ctgtacttta agagtgttgg caaccagtaa tgaataaaaa ctcccgtttt 1740 

attatatttg atgaatgctg aaagcttaca ttaatatgtc gtgcgatggc acgaaaaaac 1800 

acacgcaaac aatacagggg ggtagtcggc gggcggctaa gggtggtgct cggcgggcag 1860 

aacatcgaaa aatcaagatc tatatgaatt acacttcctc cgtaggagga agcacagggg 1920 

gagaatacca cttctccccc ggcgacataa tgtaaatgac gcagtttgcc tcgaaatact 1980 

ccagctgccc tggagtcatt tccttcatcc aatcttcatc cgagttggcg aggattattg 2040 

taggcttaga cttcttctgc acctttttct tcttaccata cttggggttt acaatgaaat 2100 

ccctctgaca gccaactaac tgtttccaac aaggacagaa tttaaacgga atatcatcta 2160 

cgatgttgta gattgcgtct tcgttgtatg aagaccaatc aacattattt tgccagtaat 2220 

tatgaacccc taggcttctg gcccaagtag attttccggt tcttgttggg ccgacgatgt 2280 

agaggctctg ctttcttgat ctttcatctg atgactggat acagaatcca tccattggag 2340 

gtcagaaatt gcatcctcga gggtataaca ggtaggttga aggagcatgt aagcttcggg 24 00 

actaacctgg aagatgttag gctggagcca atcgttgatt gactcattac aaagtaaatc 2460 

aggtgaggag ggtggatgag gattggtgaa ctcttcctga atctcaggaa aaagcttatt 2520 

tgcagagtat tcaaaatact gcaattttgt ggaccaatca aaggggagct ctttctggat 2580 

catggagagg tactcttctt tggaggtagc gtgtgaaata atgtctcgca ttatttcatc 2640 

tttagaaggc tttttttcct ttacctctga atcagatttt cctaggaagg gggacttcct 2700 

aggaatgaaa gtacctctct caaacacagc cagaggttcc ttgagaatgt' aatccctcac 2760 

tctgttaact gacttggcac tctgaatatt tgggtgaaac ccatttatat caaagaacct 2820 

tgagtcagat atccttatcg gcttctctgg ctgaagcaat gcatgtaaat gcaaacttcc 2 880 

atctttatgt gcctctcggg cacatagaat atatttggga atccaacgaa cgacgagctc 2940 

ccagatcatc tgacaggcga tttcaggatt ttctggacac tttggatagg ttaggaacgt 3000 

gttagcgttc ctgtgtgaga actgacggtt ggatgaggag gaggccatag ccgacgacgg 3060 

aggttgaggc tgagggatgg cagactggga gctccaaact ctatagtata cccgtgcgcc 3120 

ttcgaaatcc gccgctccat tgtcttatag tggttgtaaa tgggccggac cgggccggcc 3180 

cagcaggaaa agaaggcgcg cactaatatt accgcgcctt cttttcctgc gagggcccgg 3240 
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ggtagggacc gagcgctttg atttaaagcc tggttctgct ttgtatgatt tatctaaagc 3300 

agcccaatct aaagaaaccg gtcccgggca ctataaattg cctaacaagt gcgattcatt 3360 

catggatcct ttaaactcga gtctagaggg cccaattcgc cctatagtga gtcgtattac 3420 

aattcactgg ccgtcgtttt acaacgtcgt gactgggaaa accctggcgt tacccaactt '3480 

aatcgccttg cagcacatcc ccctttcgcc agctggcgta atagcgaaga ggcccgcacc 354 0 

gatcgccctt cccaacagtt gcgcagccta tacgtacggc agtttaaggt ttacacctat 3600 

aaaagagaga gccgttatcg tctgtttgtg gatgtacaga gtgatattat tgacacgccg 3660 

gggcgacgga tggtgatccc cctggccagt gcacgtctgc tgtcagataa agtctcccgt 3720 

gaactttacc cggtggtgca tatcggggat gaaagctggc gcatgatgac caccgatatg 3 780 

gccagtgtgc cggtctcegt tatcggggaa gaagtggctg atctcagcca ccgcgaaaat 3840 

gacatcaaaa acgccattaa cctgatgttc tggggaatat aaatgtcagg cctgaatggc 3 900 

gaatggacgc gccctgtagc ggcgcattaa gcgcgcgggt gtggtggtta cgcgcagcgt 3 960 

gaccgctaca cttgccagcg ccctagcgcc cgctcctttc gctttcttcc cttcctttct 4020 

cgccacgttc gccggctttc cccgtcaagc tctaaatcgg gggctccctt tagggttccg 4080 
atttagagct ttacggcacc tcgaccgcaa aaaacttgat ttgggtgatg gttcacgtag ' 4140 

tgggccatcg ccctgataga cggtttttcg ccctttgacg ttggagtcca cgttctttaa 4200 

tagtggactc ttgttccaaa ctggaacaac actcaaccct atcgcggtct attcttttga 4260 

tttataaggg atgttgccga tttcggccta ttggttaaaa aatgagctga tttaacaaaa 4320 

attttaacaa aattcagaag aactcgtcaa gaaggcgata gaaggcgatg cgctgcgaat 4 3 80 

cgggagcggc gataccgtaa agcacgagga agcggtcagc ccattcgccg ccaagctctt 4440 

cagcaatatc acgggtagcc aacgctatgt cctgatagcg gtccgccaca cccagccggc 4500 

cacagtcgat gaatccagaa aagcggccat tttccaccat gatattcggc aagcaggcat 4560 

cgccatgggt cacgacgaga tcctcgccgt cgggcatgct cgccttgagc ctggcgaaca 4620 

gttcggctgg cgcgagcccc tgatgctctt cgtccagatc atcctgatcg acaagaccgg 4 680 

cttccatccg agtacgtgct cgctcgatgc gatgtttcgc ttggtggtcg- aatgggcagg 4740 

-tagccggatc aagcgtatgc agccgccgca ttgcatcagc catgatggat actttctcgg 4800 

caggagcaag gtgagatgac aggagatcct gccccggcac ttcgcccaat agcagccagt 4 860 

cccttcccgc ttcagtgaca acgtcgagca cagctgcgca aggaacgccc gtcgtggcca 4 920 

gccacgatag ccgcgctgcc tcgtcttgca gttcattcag ggcaccggac aggtcggtct 4 980 

tgacaaaaag aaccgggcgc ccctgcgctg acagccggaa cacggcggca tcagagcagc 5040 

cgattgtctg ttgtgcccag tcatagccga atagcctctc cacccaagcg gccggagaac 5100 

ctgcgtgcaa tccatcttgt tcaatcatgc gaaacgatcc tcatcctgtc tcttgatcag 5160 

atcttgatcc cctgcgccat cagatccttg gcggcgagaa agccatccag tttactttgc 5220 

agggcttccc aaccttacca gagggcgccc cagctggcaa ttccggttcg cttgctgtcc 5280 

ataaaaccgc ccagtctagc tatcgccatg taagcccact gcaagctacc tgctttctct 5340 

ttgcgcttgc gttttccctt gtccagatag cccagtagct gacattcatc cggggtcagc 5400 

accgtttctg cggactggct ttctacgtga aaaggatcta ggtgaagatc ctttttgata 5460 

atctcatgac caaaatccct taacgtgagt tttcgttcca ctgagcgtca gaccccgtag 5520 

aaaagatcaa aggatcttct tgagatcctt tttttctgcg cgtaatctgc tgcttgcaaa 5580. 

caaaaaaacc accgctacca gcggtggttt gtttgccgga tcaagagcta ccaactcttt 5640 
ttccgaaggt aactggcttc agcagagcgc agataccaaa tactgtcctt ctagtgtagc , 5700 

cgtagttagg ccaccacttc aagaactctg tagcaccgcc tacatacctc gctctgctaa 5760 

tcctgttacc agtggctgct gccagtggcg ataagtcgtg tcttaccggg ttggactcaa 5820 

gacgatagtt accggataag gcgcagcggt cgggctgaac ggggggttcg tgcacacagc 5880- 

ccagcttgga gcgaacgacc tacaccgaac tgagatacct acagcgtgag ctatgagaaa 5 94 0 

gcgccacgct tcccgaaggg agaaaggcgg acaggtatcc ggtaagcggc agggtcggaa 6000 

caggagagcg cacgagggag cttccagggg gaaacgcctg gtatctttat agtcctgtcg 6060 

ggtttcgcca cctctgactt gagcgtcgat ttttgtgatg ctcgtcaggg gggcggagcc 6120 

tatggaaaaa cgccagcaac gcggcctttt tacggttcct gggcttttgc tggccttttg 6180 

ctcacatgtt ctttcctgcg ttatcccctg attctgtgga taaccgtatt accgcctttg 6240 

agtgagctga taccgctcgc cgcagccgaa cgaccgagcg cagcgagtca gtgagcgagg 63 00 

aagcggaag 6309 



<210> 13 
<211> 8043 
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<212> DNA 
<213> Viral 



240 
300 
360 



<400> 13 

agcgcccaat acgcaaaccg cctctccccg cgcgttggcc gattcattaa tgcagctggc 60 

acgacaggtt tcccgactgg aaagcgggca gtgagcgcaa cgcaattaat gtgagttagc 120 

tcactcatta ggcaccccag gctttacact ttatgcttcc ggctcgtatg ttgtgtggaa 180 
ttgtgagcgg ataacaattt cacacaggaa acagctatga ccatgattac gccaagctat 
ttaggtgaca ctatagaata ctcaagctat gcatcaagct tggtaccgag ctcggatcca 
ctagtaacgg ccgccagtgt gctggaattc atgggcagac ccgtctgtac tttaagagtg 

ttggcaacca gtaatgaata aaaactcccg ttttattaca tttgatgaat gctgaaagct 420 

tacattaata tgtcgtgcga tggcacgaaa aaacacacgc aaacaataca ggggggtagt 4 80 

cggcgggcgg ctaagggtgg tgctcggcgg gcagaacatc gaaaaatcaa gatctatatg 540 

aattacactt cctccgtagg aggaagcaca gggggagaat accacttctc ccccggcgac 600 

ataatgtaaa tgacgcagtt tgcctcgaaa tactccagct gccctggagt catttccttc 660 

atccaatctt catccgagtt ggcgaggatt attgtaggct . tagacttctt ctgcaccttt 720 

ttcttcttac catacttggg gtttacaatg aaatccctct gacagccaac taactgtttc 780 

caacaaggac agaatttaaa cggaatatca tctacgatgt tgtagattgc gtcttcgttg 840 

tatgaagacc aatcaacatt attttgccag taattatgaa cccctaggct tctggcccaa 900 

gtagattttc cggttcttgt tgggccgacg atgtagaggc tctgctttct tgatctttca 960 

tctgatgact ggatacagaa tccatccatt ggaggtcaga aattgcatcc tcgagggtat 1020 

aacaggtagg ttgaaggagc atgtaagctt cgggactaac ctggaagatg ttaggctgga 1080 

gccaatcgtt gattgactca ttacaaagta aatcaggtga. ggagggtgga tgaggattgg 1140 

tgaactcttc ctgaatctca ggaaaaagct tatttgcaga gtattcaaaa tactgcaatt 1200 

ttgtggacca atcaaagggg agctctttct ggatcatgga gaggtactct tctttggagg 12 60 

tagcgtgtga aataatgtct cgcattattt catctttaga- aggctttttt- tcctttacct 1320 

ctgaatcaga ttttcctagg aagggggact tcctaggaat gaaagtacct ctctcaaaca 1380 

cagccagagg ttccttgaga atgtaatccc tcactctgtt aactgacttg gcactctgaa 1440 

tatttgggtg aaacccattt atatcaaaga accttgagtc agatatcctt atcggcttct 1500 

ctggctgaag caatgcatgt aaatgcaaac ttccatcttt atgtgcctct cgggcacata 1560 

gaatatattt gggaatccaa cgaacgacga gctcccagat catctgacag gcgatttcag 1620 

gattttctgg acactttgga taggttagga acgtgttagc gttcctgtgt gagaactgac 1680 

ggttggatga ggaggaggcc atagccgacg acggaggttg aggctgaggg atggcagact 1740 

gggagctcca aactctatag tatacccgtg cgccttcgaa atccgccgct ccattgtctt 1800 

atagtggttg taaatgggcc ggaccgggcc ggcccagcag gaaaagaagg cgcgcactaa 1860 

tattaccgcg ccttcttttc ctgcgagggc ccggtaggga ccgagcgctt tgatttaaag 1920 

cctggttctg ctttgtatga tttatctaaa gcagcccaat ctaaagaaac cggtcccggg 1980 

cactataaat tgcctaacaa gtgcgattca ttcatggatc ctttaaactc gagtctagtc 2040 

ccgatctagt aacatagatg acaccgcgcg cgataattta tcctagtttg. cgcgctatat 2100 

tttgttttct atcgcgtatt aaatgtataa ttgcgggact ctaatcataa aaacccatct 2160 

cataaataac gtcatgcatt acatgttaat tattacatgc ttaacgtaat tcaacagaaa 2220 

ttatatgata atcatcgaca gaccggcaac aggattcaat cttaagaaac tttattgcca 2280 

aatgtttgaa cgatcgggga aattcgctcg agttaattaa gcggccgcct caaaaaggat 2340 

cttcacctag atccttttaa attaaaaatg aagttttagc acgtgtcagt cctgctcctc 2400 

ggccacgaag tgcacgcagt tgccggccgg gtcgcgcagg gcgaactccc gcccccacgg 24 60 

ctgctcgccg atctcggtca tggccggccc ggaggcgtcc cggaagttcg tggacacgac 2520 

ctccgaccac tcggcgtaca gctcgtccag gccgcgcacc cacacccagg ccagggtgtt 2580 

gtccggcacc acctggtcct ggaccgcgct gatgaacagg gtcacgtcgt cccggaccac 2640 

accggcgaag tcgtcctcca cgaagtcccg ggagaacccg agccggtcgg tccagaactc 2700 

gaccgctccg gcgacgtcgc gcgcggtgag caccggaacg gcactggtca acttggccat 2760 

ggtggccctc ctcacgtgct attattgaag catttatcag ggttattgtc tcatgagcgg 2820 

atacatattt gaatgtattt agaaaaataa acaaataggg gttccgcgca catttccccg 2 880 

aaaagtgcca cctgtatgcg gtgtgaaata ccgcacagat gcgtaaggag aaaataccgc 2940 

atcaggcgaa attgtaaacg cggccgctta attaagtcga cgtcctctcc aaatgaaatg 3000 

aacttcctta tatagaggaa gggtcttgcg aaggatagtg ggattgtgcg tcatccctta 3060 

cgtcagtgga gatatcacat caatccactt gctttgaaga cgtggttgga acgtcttctt 3120 

tttccacgta gctcctcgtg ggtgggggtc catctttggg accactgtcg gcagaggcat 3180 
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cttgaacgat agcctttcct tatcgcaatg atggcatttg taggtgccac cttccttttc 3240 

tactgtcctt ttgatgaagt gacagatagc tgggcaatgg aatccgagga ggtttcccga 3300 

tattaccctt tgttgaaaag tctcaatagc cctttggtct tctgagactg tatctttgat 3360 

attcttggag tagacgagag agtgtcgtgc tccaccatgt tgacgaattc atgggcagac 3420 

ccgtctgtac tttaagagtg ttggcaacca gtaatgaata aaaactcccg ttttattata 3480 

tttgatgaat gctgaaagct tacattaata tgtcgtgcga tggcacgaaa aaacacacgc 3540 

aaacaataca ggggggtagt cggcgggcgg ctaagggtgg tgctcggcgg gcagaacatc 3600 

gaaaaatcaa gatctatatg aattacactt cctccgtagg aggaagcaca gggggagaat 3660 

accacttctc ccccggcgac ataatgtaaa tgacgcagtt tgcctcgaaa tactccagct 3720 

gccctggagt catttccttc atccaatctt catccgagtt ggcgaggatt attgtaggct 3780 

tagacttctt ctgcaccttt ttcttcttac catacttggg gtttacaatg aaatccctct 3840 

gacagccaac taactgtttc caacaaggac agaatttaaa cggaatatca tctacgatgt 3900 

'tgtagattgc gtcttcgttg tatgaagacc aatcaacatt attttgccag taattatgaa 3960 

cccctaggct tctggcccaa gtagattttc cggttcttgt tgggccgacg atgtagaggc 4 020 

tctgctttct tgatctttca tctgatgact ggatacagaa tccatccatt ggaggtcaga 4080 

aattgcatcc tcgagggtat aacaggtagg ttgaaggagc atgtaagctt cgggactaac 4140 

ctggaagatg ttaggctgga gccaatcgtt gattgactca ttacaaagta aatcaggtga 4200 

ggagggtgga tgaggattgg tgaactcttc ctgaatctca ggaaaaagct tatttgcaga 4260 

gtattcaaaa tactgcaatt ttgtggacca atcaaagggg agctctttct ggatcatgga 4320 

gaggtactct tctttggagg tagcgtgtga aataatgtct cgcattattt catctttaga 43 80 

aggctttttt tcctttacct ctgaatcaga ttttcctagg aagggggact tcctaggaat 4440 

gaaagtacct ctctcaaaca cagccagagg ttccttgaga atgtaatccc tcactctgtt 4500 

aactgacttg gcactctgaa tatttgggtg aaacccattt atatcaaaga accttgagtc 4560 

agatatcctt atcggcttct ctggctgaag caatgcatgt aaatgcaaac ttccatcttt 4620 . 

atgtgcctct cgggcacata gaatatattt gggaatccaa cgaacgacga gctcccagat 4680 

catctgacag gcgatttcag gattttctgg acactttgga taggttagga acgtgttagc 4740 

gttcctgtgt gagaactgac ggttggatga ggaggaggcc atagccgacg acggaggttg 4 800 

aggctgaggg atggcagact gggagctcca aactctatag tatacccgtg cgccttcgaa 4860 

atccgccgct ccattgtctt atagtggttg taaatgggcc ggaccgggcc ggcccagcag 4920 

gaaaagaagg cgcgcactaa tattaccgcg ccttcttttc ctgcgagggc ccggggtagg 4980 

gaccgagcgc tttgatttaa agcctggttc tgctttgtat gatttatcta aagcagccca 5040 

■ atctaaagaa acbggtcccg ggcactataa attgcctaac aagtgcgatt cattcatgga 5100 

tcctttaaac tcgagtctag agggcccaat tcgccctata gtgagtcgta ttacaattca 5160 

ctggccgtcg ttttacaacg tcgtgactgg gaaaaccctg gcgttaccca acttaatcgc 5220 

cttgcagcac atcccccttt cgccagctgg cgtaatagcg aagaggcccg caccgatcgc 5280 

ccttcccaac agttgcgcag cctatacgta cggcagttta aggtttacac ctataaaaga 5340 

gagagccgtt atcgtctgtt tgtggatgta cagagtgata ttattgacac gccggggcga 5400 

cggatggtga tccccctggc cagtgcacgt ctgctgtcag ataaagtctc ccgtgaactt 54 60 

tacccggtgg tgcatatcgg ggatgaaagc tggcgcatga tgaccaccga tatggccagt 5520 

gtgccggtct ccgttatcgg ggaagaagtg gctgatctca gccaccgcga aaatgacatc 5580 

aaaaacgcca ttaacctgat gttctgggga atataaatgt caggcctgaa tggcgaatgg 564 0 

acgcgccctg tagcggcgca ttaagcgcgc gggtgtggtg gttacgcgca gcgtgaccgc 5700 

tacacttgcc agcgccctag cgcccgctcc tttcgctttc ttcccttcct ttctcgccac 5760 

gttcgccggc tttccccgtc aagctctaaa tcgggggctc cctttagggt tccgatttag 5820 

agctttacgg cacctcgacc gcaaaaaact tgatttgggt gatggttcac gtagtgggcc 5880 

atcgccctga tagacggttt ttcgcccttt gacgttggag. tccacgttct ttaatagtgg 5940 
actcttgttc caaactggaa caacactcaa ccctatcgcg gtctattctt ttgatttata , 6000 

agggatgttg ccgatttcgg cctattggtt aaaaaatgag ctgatttaac aaaaatttta 6060 

acaaaattca gaagaactcg tcaagaaggc gatagaaggc gatgcgctgc gaatcgggag 6120 

cggcgatacc gtaaagcacg aggaagcggt cagcccattc gccgccaagc tcttcagcaa 6180 

tatcacgggt agccaacgct atgtcctgat agcggtccgc cacacccagc cggccacagt 6240 

cgatgaatcc agaaaagcgg ccattttcca ccatgatatt cggcaagcag gcatcgccat 6300 

gggtcacgac gagatcctcg ccgtcgggca tgctcgcctt gagcctggcg aacagttcgg 6360 

ctggcgcgag cccctgatgc tcttcgtcca gatcatcctg atcgacaaga .ccggcttcca 6420 
tccgagtacg tgctcgctcg atgcgatgtt tcgcttggtg gtcgaatggg caggtagccg , 6480 

gatcaagcgt atgcagccgc cgcattgcat cagccatgat ggatactttc tcggcaggag 6540 

caaggtgaga tgacaggaga tcctgccccg gcacttcgcc caatagcagc cagtcccttc 6600 
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ccgcttcagt 
atagccgcgc 
aaagaaccgg 
tctgttgtgc 
gcaatccatc 
atcccctgcg 
tcccaacctt 
ccgcccagtc 
ttgcgttttc 
tctgcggact 
tgaccaaaat 
tcaaaggatc 
aaccaccgct 
aggtaactgg 
taggccacca 
taccagtggc 
agttaccgga 
tggagcgaac 
cgcttcccga 
agcgcacgag 
gccacctctg 
aaaacgccag 
tgttctttcc 
ctgataccgc. 
aag 



gacaacgtcg 
tgcctcgtct 
gcgcccctgc 
ccagtcatag 
ttgttcaatc 
ccatcagatc 
accagagggc 
tagctatcgc 
ccttgtccag 
ggctttctac 
cccttaacgt 
ttcttgagat 
accagcggtg 
cttcagcaga 
cttcaagaac 
tgctgccagt 
taaggcgcag 
gacctacacc 
agggagaaag 
ggagcttcca 
acttgagcgt 
caacgcggcc 
tgcgttatcc 
tcgccgcagc 



agcacagctg 
tgcagttcat 
gctgacagcc 
ccgaatagcc 
atgcgaaacg 
cttggcggcg 
gccccagctg 
catgtaagcc 
atagcccagt 
gtgaaaagga 
gagttttcgt 
cctttttttc 
gtttgtttgc 
gcgcagatac 
tctgtagcac 
ggcgataagt 
cggtcgggct 
gaactgagat 
gcggacaggt 
gggggaaacg 
cgatttttgt 
tttttacggt 
cctgattctg 
cgaacgaccg 



cgcaaggaac 
tcagggcacc 
ggaacacggc 
tctccaccca 
atcctcatcc 
agaaagccat 
gcaattccgg 
cactgcaagc 
agctgacatt 
tctaggtgaa 
tccactgagc 
tgcgcgtaat 
cggatcaaga 
caaatactgt 
cgcctacata 
cgtgtcttac 
gaacgggggg 
acctacagcg 
atccggtaag 
cctggtatct 
gatgctcgtc 
tcctgggctt 
tggataaccg 
agcgcagcga 



gcccgtcgtg 
ggacaggtcg 
ggcatcagag 
agcggccgga 
tgtctcttga 
ccagtttact 
ttcgcttgct 
tacctgcttt 
catccggggt 
gatccttttt 
gtcagacccc 
ctgctgcttg 
gctaccaact 
ccttctagtg 
cctcgctctg 
cgggttggac 
ttcgtgcaca 
tgagctatga 
cggcagggtc 
ttatagtcct 
aggggggcgg 
ttgctggcct 
tattaccgcc 
gtcagtgagc 



gccagccacg 

gtcttgacaa 

cagccgattg 

gaacctgcgt 

tcagatcttg- 

ttgcagggct 

gtccataaaa 

ctctttgcgc 

cagcaccgtt 

gataatctca 

gtagaaaaga 

caaacaaaaa 

ctttttccga 

tagccgtagt 

ctaatcctgt 

tcaagacgat 

cagcccagct 

gaaagcgcca 

ggaacaggag 

gtcgggtttc 

agcctatgga 

tttgctcaca 

tttgagtgag 

gaggaagcgg 



6660 
6720 
6780 
6840 
6900 
6960 
7020 
7080 
7140 
7200 
7260 
7320 
7380 
7440 
7500 
7560 
7620 
7680 
7740 
7800 
7860 
7920 
7980 
8040 
8043 



<210>. 14 
<211> 7404 
<212> DNA 
- <213> Viral 

<400> 14 

agcgcccaat acgcaaaccg cctctccccg cgcgttggcc gattcattaa tgcagctggc 60 

acgacaggtt tcccgactgg aaagcgggca gtgagcgcaa cgcaattaat gtgagttagc 120 

tcactcatta ggcaccccag gctttacact ttatgcttcc ggctcgtatg ttgtgtggaa 180 

ttgtgagcgg ataacaattt cacacaggaa acagctatga ccatgattac gccaagctat 240 

ttaggtgaca ctatagaata ctcaagctat gcatcaagct tggtaccgag ctcggatcca 300 

ctagtaacgg ccgccagtgt gctggaattc atgggcagac ccgtctgtac tttaagagtg 360 

ttggcaacca gtaatgaata aaaactcccg ttttattata tttgatgaat gctgaaagct 420 

tacattaata tgtcgtgcga tggcacgaaa aaacacacgc aaacaataca ggggggtagt 4 80 

cggcgggcgg ctaagggtgg tgctcggcgg gcagaacatc gaaaaatcaa gatctatatg 540 

aattacactt cctccgtagg aggaagcaca gggggagaat accacttctc ccccggcgac 600 

ataatgtaaa tgacgcagtt tgcctcgaaa tactccagct gccctggagt catttccttc 660 

atccaatctt catccgagtt ggcgaggatt attgtaggct tagacttctt ctgcaccttt 720 

ttcttcttac catacttggg gtttacaatg aaatccctct gacagccaac taactgtttc 780 

caacaaggac agaatttaaa cggaatatca cctacgatgt tgtagattgc gtcttcgttg 840 

tatgaagacc aatcaacatt attttgccag taattatgaa cccctaggct tctggcccaa 900 

gtagattttc cggttcttgt tgggccgacg atgtagaggc tctgctttct tgatctttca 960 

tctgatgact ggatacagaa tccatccatt ggaggtcaga aattgcatcc tcgagggtat 1020 

aacaggtagg ttgaaggagc atgtaagctt cgggactaac ctggaagatg ttaggctgga 1080 

gccaatcgtt gattgactca ttacaaagta aatcaggtga ggagggtgga tgaggattgg 114 0 

tgaactcttc ctgaatctca ggaaaaagct tatttgcaga gtattcaaaa tactgcaatt 1200 

ttgtggacca atcaaagggg agctctttct ggatcatgga gaggtactct tctttggagg 1260 

tagcgtgtga aataatgtct cgcattattt catctttaga aggctttttt tcctttacct 1320 

ctgaatcaga ttttcctagg aagggggact tcctaggaat gaaagtacct ctctcaaaca 1380 
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cagccagagg ttccttgaga atgtaatccc tcactctgtt aactgacttg gcactctgaa .1440 

tatttgggtg aaacccattt atatcaaaga accttgagtc agatatcctt atcggcttct • 1500 

ctggctgaag caatgcatgt aaatgcaaac ttccatcttt atgtgcctct cgggcacata 1560 

gaatatattt gggaatccaa cgaacgacga gctcccagat catctgacag gcgatttcag 1620 

gattttctgg acactttgga taggttagga acgtgttagc gttcctgtgt gagaactgac 1680 

ggttggatga ggaggaggcc atagccgacg acggaggttg aggctgaggg atggcagact 1740 

gggagctcca aactctatag tatacccgtg cgccttcgaa atccgccgct ccattgtctt 1800 

atagtggttg taaatgggcc ggaccgggcc ggcccagcag gaaaagaagg cgcgcactaa i860 

tattaccgcg ccttcttttc ctgcgagggc ccggtaggga ccgagcgctt tgatttaaag 1920 

cctggttctg ctttgtatga tttatctaaa gcagcccaat ctaaagaaac cggtcccggg 1980 

cactataaat tgcctaacaa gtgcgattca ttcatggatc ctttaaactc gagtctagtc 2040 

ccgatctagt aacatagatg acaccgcgcg. cgataattta tcctagt,ttg cgcgctatat 2100 

tttgttttct atcgcgtatt aaatgtataa ttgcgggact ctaatcataa aaacccatct 2160 

cataaataac gtcatgcatt acatgttaat tattacatgc ttaacgtaat tcaacagaaa 2220 

ttatatgata atcatcgaca gaccggcaac aggattcaat cttaagaaac tttattgcca 2280 

aatgtttgaa cgatcgggga aattcgctcg agttaattaa gcggccgctt aattaagtcg 2340 

acgtcctctc caaatgaaat gaacttcctt atatagagga agggtcttgc gaaggatagt 2400 

gggattgtgc gtcatccctt acgtcagtgg agatatcaca tcaatccact tgctttgaag 2460 

acgtggttgg aacgtcttct ttttccacgt. agctcctcgt gggtgggggt ccatctttgg 2520 

gaccactgtc ggcagaggca tcttgaacga tagcctttcc ttatcgcaat gatggcattt 2580 

gtaggtgcca ccttcctttt ctactgtcct tttgatgaag tgacagatag ctgggcaatg 2640 

gaatccgagg aggtttcccg atattaccct ttgttgaaaa gtctcaatag ccctttggtc 2700 

ttctgagact gtatctttga tattcttgga gtagacgaga gagtgtcgtg ctccaccatg 2760 

ttgacgaatt catgggcaga cccgtctgta ctttaagagt gttggcaacc agtaatgaat 2820 

aaaaactccc gttttattat atttgatgaa tgctgaaagc ttacattaat atgtcgtgcg 2880 

atggcacgaa aaaacacacg caaacaatac - aggggggtag tcggcgggcg gctaagggtg 2 940 

gtgctcggcg ggcagaacat cgaaaaatca agatctatat gaattacact tcctccgtag 3000 

gaggaagcac agggggagaa taccacttct cccccggcga cataatgtaa atgacgcagt 3060 

ttgcctcgaa atactccagc tgccctggag tcatttcctt catccaatct tcatccgagt 3120 

tggcgaggat tattgtaggc ttagacttct tctgcacctt tttcttctta ccatacttgg 3180 

ggtttacaat gaaatccctc tgacagccaa ctaactgttt ccaacaagga cagaatttaa 3240 

acggaatatc atctacgatg ttgtagattg cgtcttcgtt gtatgaagac caatcaacat 3300 

tattttgcca gtaattatga acccctaggc ttctggccca agtagatttt ccggttcttg 3360 

ttgggccgac gatgtagagg ctctgctttc ttgatctttc atctgatgac tggatacaga 3420 

atccatccat tggaggtcag aaattgcatc -ctcgagggta taacaggtag gttgaaggag 3480 

catgtaagct tcgggactaa cctggaagat gttaggctgg agccaatcgt tgattgactc 3540 

attacaaagt aaatcaggtg aggagggtgg atgaggattg gtgaactctt cctgaatctc . 3600 

aggaaaaagc ttatttgcag agtattcaaa atactgcaat tttgtggacc aatcaaaggg 3660 

gagctctttc tggatcatgg agaggtactc ttctttggag gtagcgtgtg aaataatgtc 3720 

tcgcattatt tcatctttag aaggcttttt ttcctttacc tctgaatcag attttcctag 3780 

gaagggggac ttcctaggaa tgaaagtacc "tctctcaaac acagccagag gttccttgag 3840 

aatgtaatcc ctcactctgt taactgactt ggcactctga atatttgggt gaaacccatt 3900 

tatatcaaag aaccttgagt cagatatcct tatcggcttc tctggctgaa gcaatgcatg 3960 

taaatgcaaa cttccatctt tatgtgcctc tcgggcacat agaatatatt tgggaatcca 4020 

acgaacgacg agctcccaga tcatctgaca ggcgatttca ggattttctg gacactttgg 4080 

ataggttagg aacgtgttag cgttcctgtg tgagaactga cggttggatg aggaggaggc 4140 

catagccgac gacggaggtt gaggctgagg gatggcagac' tgggagctcc aaactctata 42 00 

gtatacccgt gcgccttcga aatccgccgc tccattgtct tatagtggtt gtaaatgggc 4260 

cggaccgggc cggcccagca ggaaaagaag gcgcgcacta atattaccgc gccttctttt 4320 

cctgcgaggg cccggggtag ggaccgagcg ctttgattta aagcctggtt ctgctttgta 43 60 

tgatttatct aaagcagccc aatctaaaga aaccggtccc gggcactata aattgcctaa 4440 

caagtgcgat tcattcatgg atcctttaaa ctcgagtcta gagggcccaa ttcgccctat 4500 

agtgagtcgt attacaattc actggccgtc gttttacaac gtcgtgactg ggaaaaccct 4560 

ggcgttaccc aacttaatcg ccttgcagca catccccctt tcgccagctg gcgtaatagc 4620 

gaagaggccc gcaccgatcg cccttcccaa cagttgcgca gcctatacgt acggcagttt 4680 

aaggtttaca cctataaaag agagagccgt tatcgtctgt ttgtggatgt acagagtgat 4740 

attattgaca cgccggggcg acggatggtg atccccctgg ccagtgcacg tctgctgtca 4800 
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gataaagtct 
atgaccaccg 
agccaccgcg 
tcaggcctga 
ggttacgcgc 
cttcccttcc 
ccctttaggg 
tgatggttca 
gtccacgttc 
ggtctattct 
gctgatttaa 
cgatgcgctg 
cgccgccaag 
ccacacccag 
tcggcaagca 
tgagcctggc 
gatcgacaag 
ggtcgaatgg 
tggatactrtt 
ccaatagcag 
cgcccgtcgt 
cggacaggtc 
cggcatcaga 
aagcggccgg 
ctgtctcttg 
tccagtttac 
gttcgcttgc 
ctacctgctt 
tcatccgggg 
agatcctttt 
cgtcagaccc 
tctgctgctt 
agctaccaac 
•-tccttcfagt 
acctcgctct 
ccgggttgga 
gttcgtgcac 
gtgagctatg 
gcggcagggt 
tttatagtcc 
caggggggcg 
tttgctggcc 
gtattaccgc 
agtcagtgag 



cccgtgaact 

atatggccag 

aaaatgacat 

atggcgaatg 

agcgtgaccg 

tttctcgcca 

ttccgattta 

cgtagtgggc 

tttaatagtg 

tttgatttat 

caaaaatttt 

cgaatcggga 

ctcttcagca 

ccggccacag 

ggcatcgcca 

gaacagttcg 

accggcttcc 

gcaggtagcc 

ctcggcagga 

ccagtccctt 

ggccagccac 

ggtcttgaca 

gcagccgatt 

agaacctgcg 

atcagatctt 

tttgcagggc 

tgtccataaa 

tctctttgcg 

tcagcaccgt 

tgataatctc 

cgtagaaaag 

gcaaacaaaa 

tctttttccg 

gtagccgtag 

gctaatcctg 

ctcaagacga 

acagcccagc 

agaaagcgcc 

cggaacagga 

tgtcgggttt 

gagcctatgg 

ttttgctcac 

ctttgagtga 

cgaggaagcg 



ttacccggtg 

tgtgccggtc 

caaaaacgcc 

gacgcgccct 

ctacacttgc 

cgttcgccgg 

gagctttacg 

catcgccctg 

gactcttgtt 

aagggatgtt 

aacaaaattc 

gcggcgatac 

atatcacggg 

tcgatgaatc 

tgggtcacga 

gctggcgcga 

atccgagtac 

ggatcaagcg 

gcaaggtgag 

cccgcttcag 

gatagccgcg 

aaaagaaccg 

gtctgttgtg 

tgcaatccat 

gatcccctgc 

ttcccaacct 

accgcccagt 

cttgcgtttt 

ttctgcggac 

atgaccaaaa 

atcaaaggat 

aaaccaccgc 

aaggtaactg 

ttaggccacc 

ttaccagtgg 

tagttaccgg 

ttggagcgaa 

acgcttcccg 

gagcgcacga 

cgccacctct 

aaaaacgcca 

atgttctttc 

gctgataccg 

gaag 



gtgcatatcg 

tccgttatcg 

attaacctga 

gtagcggcgc 

cagcgcccta 

ctttccccgt 

gcacctcgac 

atagacggtt 

ccaaactgga 

gccgatttcg 

agaagaactc 

cgtaaagcac 

tagccaacgc 

cagaaaagcg 

cgagatcctc 

gcccctgatg 

gtgctcgctc 

tatgcagccg 

atgacaggag 

tgacaacgtc 

ctgcctcgtc 

ggcgcccctg 

cccagtcata 

cttgttcaat 

gccatcagat 

taccagaggg 

ctagctatcg 

cccttgtcca 

tggctttcta 

tcccttaacg 

cttcttgaga 

taccagcggt 

gcttcagcag 

acttcaagaa 

ctgctgccag 

ataaggcgca 

cgacctacac 

aagggagaaa 

gggagcttcc 

gacttgagcg 

gcaacgcggc 

ctgcgttatc 

ctcgccgcag 



gggatgaaag 

gggaagaagt 

tgttctgggg 

attaagcgcg 

gcgcccgctc 

caagctctaa 

cgcaaaaaac 

tttcgccctt 

acaacactca 

gcctattggt 

gtcaagaagg 

gaggaagcgg 

tatgtcctga 

gccattttcc 

gccgtcgggc 

ctcttcgtcc 

gatgcgatgt 

ccgcattgca 

atcctgcccc 

gagcacagct 

ttgcagttca 

cgctgacagc 

gccgaatagc 

catgcgaaac 

ccttggcggc 

cgccccagct 

ccatgtaagc 

gatagcccag 

cgtgaaaagg 

tgagttttcg 

tccttttttt 

ggtttgtttg 

agcgcagata 

ctctgtagca 

tggcgataag 

gcggtcgggc 

cgaactgaga 

ggcggacagg- 

agggggaaac 

tcgatttttg 

ctttttacgg 

ccctgattct 

ccgaacgacc 



ctggcgcatg 

ggctgatctc 

aatataaatg 

cgggtgtggt 

ctttcgcttt 

atcgggggct 

ttgatttggg 

tgacgttgga 

accctatcgc 

taaaaaatga 

cgatagaagg 

tcagcccatt 

tagcggtccg 

accatgatat 

atgctcgcct 

agatcatcct 

ttcgcttggt 

tcagccatga 

ggcacttcgc 

gcgcaaggaa 

ttcagggcac 

cggaacacgg 

ctctccaccc 

gatcctcatc 

gagaaagcca 

ggcaattccg 

ccactgcaag 

tagctgacat 

atctaggtga 

ttccactgag 

ctgcgcgtaa 

ccggatcaag 

ccaaatactg 

ccgcctacat 

tcgtgtctta 

tgaacggggg 

tacctacagc 

tatccggtaa 

gcctggtatc 

tgatgctcgt 

ttcctgggct 

gtggataacc 

gagcgcagcg 



4B60 
4920 
4980 

. 5040 
5100 
5160 
5220 
5280 
5340 
5400 
5460 
5520 
5580 

. 5640 
5700 
5760 
582 0 
5880 
5940 
6000 
6060 
6120 
6180 
6240 
6300 
6360 
6420 
6480 
6540 
6600 
. 6660 
6720 
6780 
6840 
6900 
6960 
7020 
7080 
7140 
7200 
7260 
7320 
7380 
7404 
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